How to implement float hashing with approximate equalityHow to implement Continuous Delivery with Java webapp?How to test if a hashing algorithm is good?How Lua handles both integer and float numbers?How do crackers determine number of iterations of a Hashing algorithm?How to ensure objects unique by equality?How to convert byte-array (4 bytes) back into float?Should `Vector<float>.Equals` be reflexive or should it follow IEEE 754 semantics?Avoiding Division by Zero Using Float ComparisonHow does Pearson hashing compare with other non-cryptographic hashing algorithms?Does comparing equality of float numbers mislead junior developers even if no rounding error occurs in my case?
Is this strange Morse signal type common?
How can I test a shell script in a "safe environment" to avoid harm to my computer?
As a small race with a heavy weapon, does enlage remove the disadvantage?
Why doesn't increasing the temperature of something like wood or paper set them on fire?
Is it a good idea to copy a trader when investing?
I want to write a blog post building upon someone else's paper, how can I properly cite/credit them?
Is there an application which does HTTP PUT?
Align a table column at a specific symbol
Why is the episode called "The Last of the Starks"?
Gift for mentor after his thesis defense?
Why did Ham the Chimp push levers?
How to append code verbatim to .bashrc?
GLM: Modelling proportional data - account for variation in total sample size
How long can fsck take on a 30 TB volume?
Expl3 and recent xparse on overleaf: No expl3 loader detected
The unknown and unexplained in science fiction
Is it safe to keep the GPU on 100% utilization for a very long time?
I'm attempting to understand my 401k match and how much I need to contribute to maximize the match
What is the Ancient One's mistake?
Should one save up to purchase a house/condo or maximize their 401(k) first?
Can I bring back Planetary Romance as a genre?
Whose birthyears are canonically established in the MCU?
Why is it wrong to *implement* myself a known, published, widely believed to be secure crypto algorithm?
How do I minimise waste on a flight?
How to implement float hashing with approximate equality
How to implement Continuous Delivery with Java webapp?How to test if a hashing algorithm is good?How Lua handles both integer and float numbers?How do crackers determine number of iterations of a Hashing algorithm?How to ensure objects unique by equality?How to convert byte-array (4 bytes) back into float?Should `Vector<float>.Equals` be reflexive or should it follow IEEE 754 semantics?Avoiding Division by Zero Using Float ComparisonHow does Pearson hashing compare with other non-cryptographic hashing algorithms?Does comparing equality of float numbers mislead junior developers even if no rounding error occurs in my case?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Let's say we have the following Python class (the problem exists in Java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
|
show 2 more comments
Let's say we have the following Python class (the problem exists in Java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
8
why is the question has Java's tag?
– Laiv
Apr 29 at 7:53
8
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
Apr 29 at 9:04
6
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
Apr 29 at 11:12
4
Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call itkelvin
?
– Solomon Ucko
Apr 29 at 12:01
5
Python has more-or-less excellent fixed-point support, maybe that’s something for you.
– Jonas Schäfer
Apr 29 at 14:07
|
show 2 more comments
Let's say we have the following Python class (the problem exists in Java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
Let's say we have the following Python class (the problem exists in Java just the same with equals
and hashCode
)
class Temperature:
def __init__(self, degrees):
self.degrees = degrees
where degrees
is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature
in a way that
- compares floats up to an epsilon difference instead of direct equality testing,
- and honors the contract that
a == b
implieshash(a) == hash(b)
.
def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON
def __hash__(self):
return # What goes here?
The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0)
but this is not quite the same problem.
Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?
Update: Now I understand that this type of equality testing for floats eliminates the transitivity of ==
and equals
. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?
java python hashing floating-point
java python hashing floating-point
edited Apr 30 at 9:43
Glorfindel
2,39541727
2,39541727
asked Apr 28 at 23:41
CQQLCQQL
17916
17916
8
why is the question has Java's tag?
– Laiv
Apr 29 at 7:53
8
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
Apr 29 at 9:04
6
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
Apr 29 at 11:12
4
Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call itkelvin
?
– Solomon Ucko
Apr 29 at 12:01
5
Python has more-or-less excellent fixed-point support, maybe that’s something for you.
– Jonas Schäfer
Apr 29 at 14:07
|
show 2 more comments
8
why is the question has Java's tag?
– Laiv
Apr 29 at 7:53
8
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
Apr 29 at 9:04
6
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
Apr 29 at 11:12
4
Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call itkelvin
?
– Solomon Ucko
Apr 29 at 12:01
5
Python has more-or-less excellent fixed-point support, maybe that’s something for you.
– Jonas Schäfer
Apr 29 at 14:07
8
8
why is the question has Java's tag?
– Laiv
Apr 29 at 7:53
why is the question has Java's tag?
– Laiv
Apr 29 at 7:53
8
8
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
Apr 29 at 9:04
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
Apr 29 at 9:04
6
6
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
Apr 29 at 11:12
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
Apr 29 at 11:12
4
4
Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it
kelvin
?– Solomon Ucko
Apr 29 at 12:01
Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it
kelvin
?– Solomon Ucko
Apr 29 at 12:01
5
5
Python has more-or-less excellent fixed-point support, maybe that’s something for you.
– Jonas Schäfer
Apr 29 at 14:07
Python has more-or-less excellent fixed-point support, maybe that’s something for you.
– Jonas Schäfer
Apr 29 at 14:07
|
show 2 more comments
6 Answers
6
active
oldest
votes
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
2
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
Addressing @OP's edit, I would add that the incorrectness of floating-point==
should "infect" the==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used onTemperature
. It's the only thing you can do, really.
– HTNW
Apr 29 at 16:56
@HTNW: That would be too simple. A ratio class might have afloat approximation
field which does not participate in==
. Besides, the static analysis tool will already give a warning inside the==
implementation of classes when one of the members being compared is afloat
type.
– MSalters
Apr 30 at 9:58
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has afloat
field that doesn't participate in==
, then don't configure your tool to warn on==
on that class. If the class does, then presumably marking the class's==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if@Deprecated void foo()
, thenvoid bar() foo();
is a warning, but@Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.
– HTNW
Apr 30 at 12:24
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
2
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
4
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
2
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
1
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
add a comment |
You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.
Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.
BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!
#Pseudo code
class Temperature:
def __init__(self, degrees):
#CHECK INVALID VALUES HERE
#TRANSFORM TO KELVIN HERE
self.degrees = Math.floor(kelvin/EPSILON)
add a comment |
Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:
Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.
Store each item within the hash table using keys that are above and below the value being sought.
Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.
add a comment |
You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.
There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.
add a comment |
This isn't an answer, but an extended comment that may be helpful.
I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.
I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2
will yield 2
instead of 2.00000001
or similar. Of course, this will introduce additional complications that may or may not be worth it.
add a comment |
protected by gnat Apr 30 at 5:15
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
2
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
Addressing @OP's edit, I would add that the incorrectness of floating-point==
should "infect" the==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used onTemperature
. It's the only thing you can do, really.
– HTNW
Apr 29 at 16:56
@HTNW: That would be too simple. A ratio class might have afloat approximation
field which does not participate in==
. Besides, the static analysis tool will already give a warning inside the==
implementation of classes when one of the members being compared is afloat
type.
– MSalters
Apr 30 at 9:58
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has afloat
field that doesn't participate in==
, then don't configure your tool to warn on==
on that class. If the class does, then presumably marking the class's==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if@Deprecated void foo()
, thenvoid bar() foo();
is a warning, but@Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.
– HTNW
Apr 30 at 12:24
add a comment |
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
2
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
Addressing @OP's edit, I would add that the incorrectness of floating-point==
should "infect" the==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used onTemperature
. It's the only thing you can do, really.
– HTNW
Apr 29 at 16:56
@HTNW: That would be too simple. A ratio class might have afloat approximation
field which does not participate in==
. Besides, the static analysis tool will already give a warning inside the==
implementation of classes when one of the members being compared is afloat
type.
– MSalters
Apr 30 at 9:58
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has afloat
field that doesn't participate in==
, then don't configure your tool to warn on==
on that class. If the class does, then presumably marking the class's==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if@Deprecated void foo()
, thenvoid bar() foo();
is a warning, but@Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.
– HTNW
Apr 30 at 12:24
add a comment |
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,
Fuzzy equality violates the requirements that Java places on the equals
method, namely transitivity, i.e. that if x == y
and y == z
, then x == z
. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2
and 0.2 == 0.3
, but 0.1 == 0.3
does not hold.
While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.
So I strongly recommend you don't do that.
Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.
(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)
answered Apr 29 at 6:30
Sebastian RedlSebastian Redl
11.7k63842
11.7k63842
2
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
Addressing @OP's edit, I would add that the incorrectness of floating-point==
should "infect" the==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used onTemperature
. It's the only thing you can do, really.
– HTNW
Apr 29 at 16:56
@HTNW: That would be too simple. A ratio class might have afloat approximation
field which does not participate in==
. Besides, the static analysis tool will already give a warning inside the==
implementation of classes when one of the members being compared is afloat
type.
– MSalters
Apr 30 at 9:58
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has afloat
field that doesn't participate in==
, then don't configure your tool to warn on==
on that class. If the class does, then presumably marking the class's==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if@Deprecated void foo()
, thenvoid bar() foo();
is a warning, but@Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.
– HTNW
Apr 30 at 12:24
add a comment |
2
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
Addressing @OP's edit, I would add that the incorrectness of floating-point==
should "infect" the==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used onTemperature
. It's the only thing you can do, really.
– HTNW
Apr 29 at 16:56
@HTNW: That would be too simple. A ratio class might have afloat approximation
field which does not participate in==
. Besides, the static analysis tool will already give a warning inside the==
implementation of classes when one of the members being compared is afloat
type.
– MSalters
Apr 30 at 9:58
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has afloat
field that doesn't participate in==
, then don't configure your tool to warn on==
on that class. If the class does, then presumably marking the class's==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if@Deprecated void foo()
, thenvoid bar() foo();
is a warning, but@Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.
– HTNW
Apr 30 at 12:24
2
2
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?
– Christophe
Apr 29 at 6:51
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.
– Ordous
Apr 29 at 14:50
Addressing @OP's edit, I would add that the incorrectness of floating-point
==
should "infect" the ==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature
. It's the only thing you can do, really.– HTNW
Apr 29 at 16:56
Addressing @OP's edit, I would add that the incorrectness of floating-point
==
should "infect" the ==
of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature
. It's the only thing you can do, really.– HTNW
Apr 29 at 16:56
@HTNW: That would be too simple. A ratio class might have a
float approximation
field which does not participate in ==
. Besides, the static analysis tool will already give a warning inside the ==
implementation of classes when one of the members being compared is a float
type.– MSalters
Apr 30 at 9:58
@HTNW: That would be too simple. A ratio class might have a
float approximation
field which does not participate in ==
. Besides, the static analysis tool will already give a warning inside the ==
implementation of classes when one of the members being compared is a float
type.– MSalters
Apr 30 at 9:58
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a
float
field that doesn't participate in ==
, then don't configure your tool to warn on ==
on that class. If the class does, then presumably marking the class's ==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo()
, then void bar() foo();
is a warning, but @Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.– HTNW
Apr 30 at 12:24
@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a
float
field that doesn't participate in ==
, then don't configure your tool to warn on ==
on that class. If the class does, then presumably marking the class's ==
as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo()
, then void bar() foo();
is a warning, but @Deprecated void bar() foo();
is not. Maybe many tools don't support this, but some might.– HTNW
Apr 30 at 12:24
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
2
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
4
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
2
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
1
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
2
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
4
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
2
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
1
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
add a comment |
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
Good Luck
You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.
Example:
Assume that each point hashes to its own unique hash value.
As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.
For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.
- Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.
There are a few cases where this will not hold true:
- Positive/Negative Infinity
- NaN
- A few De-normalised ranges that may not be linkable to the main range for a given epsilon.
- perhaps a few other format specific instances
However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.
Outcome
Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).
Or the epsilon is such that only exact matches are permitted.
Granular
You could of course go for a granular approach instead.
Under this approach you define exact buckets down to a particular resolution. ie:
[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...
Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.
Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.
answered Apr 29 at 2:13
Kain0_0Kain0_0
4,832420
4,832420
2
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
4
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
2
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
1
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
add a comment |
2
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
4
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
2
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
1
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
2
2
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.
– Neil
Apr 29 at 6:35
4
4
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.
– J. Fabian Meier
Apr 29 at 8:59
2
2
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.
– Doc Brown
Apr 29 at 15:26
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.
– Carsten S
Apr 30 at 9:43
1
1
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.
– Kain0_0
Apr 30 at 23:50
add a comment |
You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.
Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.
BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!
#Pseudo code
class Temperature:
def __init__(self, degrees):
#CHECK INVALID VALUES HERE
#TRANSFORM TO KELVIN HERE
self.degrees = Math.floor(kelvin/EPSILON)
add a comment |
You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.
Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.
BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!
#Pseudo code
class Temperature:
def __init__(self, degrees):
#CHECK INVALID VALUES HERE
#TRANSFORM TO KELVIN HERE
self.degrees = Math.floor(kelvin/EPSILON)
add a comment |
You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.
Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.
BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!
#Pseudo code
class Temperature:
def __init__(self, degrees):
#CHECK INVALID VALUES HERE
#TRANSFORM TO KELVIN HERE
self.degrees = Math.floor(kelvin/EPSILON)
You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.
Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.
BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!
#Pseudo code
class Temperature:
def __init__(self, degrees):
#CHECK INVALID VALUES HERE
#TRANSFORM TO KELVIN HERE
self.degrees = Math.floor(kelvin/EPSILON)
edited Apr 30 at 9:45
Glorfindel
2,39541727
2,39541727
answered Apr 29 at 21:55
Alessandro TeruzziAlessandro Teruzzi
1994
1994
add a comment |
add a comment |
Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:
Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.
Store each item within the hash table using keys that are above and below the value being sought.
Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.
add a comment |
Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:
Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.
Store each item within the hash table using keys that are above and below the value being sought.
Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.
add a comment |
Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:
Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.
Store each item within the hash table using keys that are above and below the value being sought.
Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.
Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:
Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.
Store each item within the hash table using keys that are above and below the value being sought.
Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.
answered Apr 29 at 15:50
supercatsupercat
7,1661727
7,1661727
add a comment |
add a comment |
You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.
There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.
add a comment |
You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.
There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.
add a comment |
You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.
There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.
You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.
There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.
answered Apr 29 at 14:00
gnasher729gnasher729
21.1k22762
21.1k22762
add a comment |
add a comment |
This isn't an answer, but an extended comment that may be helpful.
I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.
I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2
will yield 2
instead of 2.00000001
or similar. Of course, this will introduce additional complications that may or may not be worth it.
add a comment |
This isn't an answer, but an extended comment that may be helpful.
I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.
I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2
will yield 2
instead of 2.00000001
or similar. Of course, this will introduce additional complications that may or may not be worth it.
add a comment |
This isn't an answer, but an extended comment that may be helpful.
I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.
I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2
will yield 2
instead of 2.00000001
or similar. Of course, this will introduce additional complications that may or may not be worth it.
This isn't an answer, but an extended comment that may be helpful.
I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.
I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2
will yield 2
instead of 2.00000001
or similar. Of course, this will introduce additional complications that may or may not be worth it.
answered Apr 29 at 14:45
BurnsBABurnsBA
1011
1011
add a comment |
add a comment |
protected by gnat Apr 30 at 5:15
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
8
why is the question has Java's tag?
– Laiv
Apr 29 at 7:53
8
About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.
– J. Fabian Meier
Apr 29 at 9:04
6
@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...
– Matthieu M.
Apr 29 at 11:12
4
Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it
kelvin
?– Solomon Ucko
Apr 29 at 12:01
5
Python has more-or-less excellent fixed-point support, maybe that’s something for you.
– Jonas Schäfer
Apr 29 at 14:07