How to implement float hashing with approximate equalityHow to implement Continuous Delivery with Java webapp?How to test if a hashing algorithm is good?How Lua handles both integer and float numbers?How do crackers determine number of iterations of a Hashing algorithm?How to ensure objects unique by equality?How to convert byte-array (4 bytes) back into float?Should `Vector<float>.Equals` be reflexive or should it follow IEEE 754 semantics?Avoiding Division by Zero Using Float ComparisonHow does Pearson hashing compare with other non-cryptographic hashing algorithms?Does comparing equality of float numbers mislead junior developers even if no rounding error occurs in my case?

Is this strange Morse signal type common?

How can I test a shell script in a "safe environment" to avoid harm to my computer?

As a small race with a heavy weapon, does enlage remove the disadvantage?

Why doesn't increasing the temperature of something like wood or paper set them on fire?

Is it a good idea to copy a trader when investing?

I want to write a blog post building upon someone else's paper, how can I properly cite/credit them?

Is there an application which does HTTP PUT?

Align a table column at a specific symbol

Why is the episode called "The Last of the Starks"?

Gift for mentor after his thesis defense?

Why did Ham the Chimp push levers?

How to append code verbatim to .bashrc?

GLM: Modelling proportional data - account for variation in total sample size

How long can fsck take on a 30 TB volume?

Expl3 and recent xparse on overleaf: No expl3 loader detected

The unknown and unexplained in science fiction

Is it safe to keep the GPU on 100% utilization for a very long time?

I'm attempting to understand my 401k match and how much I need to contribute to maximize the match

What is the Ancient One's mistake?

Should one save up to purchase a house/condo or maximize their 401(k) first?

Can I bring back Planetary Romance as a genre?

Whose birthyears are canonically established in the MCU?

Why is it wrong to *implement* myself a known, published, widely believed to be secure crypto algorithm?

How do I minimise waste on a flight?

How to implement float hashing with approximate equality

How to implement Continuous Delivery with Java webapp?How to test if a hashing algorithm is good?How Lua handles both integer and float numbers?How do crackers determine number of iterations of a Hashing algorithm?How to ensure objects unique by equality?How to convert byte-array (4 bytes) back into float?Should `Vector<float>.Equals` be reflexive or should it follow IEEE 754 semantics?Avoiding Division by Zero Using Float ComparisonHow does Pearson hashing compare with other non-cryptographic hashing algorithms?Does comparing equality of float numbers mislead junior developers even if no rounding error occurs in my case?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)

class Temperature:
 def __init__(self, degrees):
 self.degrees = degrees

where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that

compares floats up to an epsilon difference instead of direct equality testing,

and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
 return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
 return # What goes here?

The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.

Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?

Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?

edited Apr 30 at 9:43

Glorfindel

2,39541727

asked Apr 28 at 23:41

CQQL

17916

8

why is the question has Java's tag?

– Laiv
Apr 29 at 7:53

8

About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04

6

@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12

4

Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01

5

Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07

|
show 2 more comments

Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)

class Temperature:
 def __init__(self, degrees):
 self.degrees = degrees

where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that

compares floats up to an epsilon difference instead of direct equality testing,

and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
 return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
 return # What goes here?

The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.

Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?

edited Apr 30 at 9:43

Glorfindel

2,39541727

asked Apr 28 at 23:41

CQQL

17916

8

why is the question has Java's tag?

– Laiv
Apr 29 at 7:53

8

About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04

6

@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12

4

Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01

5

Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07

|
show 2 more comments

Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)

class Temperature:
 def __init__(self, degrees):
 self.degrees = degrees

where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that

compares floats up to an epsilon difference instead of direct equality testing,

and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
 return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
 return # What goes here?

The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.

Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?

edited Apr 30 at 9:43

Glorfindel

2,39541727

asked Apr 28 at 23:41

CQQL

17916

Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)

class Temperature:
 def __init__(self, degrees):
 self.degrees = degrees

where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that

compares floats up to an epsilon difference instead of direct equality testing,

and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
 return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
 return # What goes here?

The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.

Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?

java python hashing floating-point

edited Apr 30 at 9:43

Glorfindel

2,39541727

asked Apr 28 at 23:41

CQQL

17916

edited Apr 30 at 9:43

Glorfindel

2,39541727

asked Apr 28 at 23:41

CQQL

17916

edited Apr 30 at 9:43

Glorfindel

2,39541727

edited Apr 30 at 9:43

Glorfindel

2,39541727

edited Apr 30 at 9:43

Glorfindel

2,39541727

asked Apr 28 at 23:41

CQQL

17916

asked Apr 28 at 23:41

CQQL

17916

asked Apr 28 at 23:41

CQQL

17916

8

why is the question has Java's tag?

– Laiv
Apr 29 at 7:53

8

About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04

6

@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12

4

Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01

5

Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07

|
show 2 more comments

8

why is the question has Java's tag?

– Laiv
Apr 29 at 7:53

8

About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04

6

@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12

4

Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01

5

Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07

why is the question has Java's tag?

– Laiv
Apr 29 at 7:53

About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04

@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12

Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01

Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07

|
show 2 more comments

6 Answers
6

active

oldest

votes

implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,

Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.

While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.

So I strongly recommend you don't do that.

Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.

(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

2

interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
Apr 29 at 6:51

@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

– Ordous
Apr 29 at 14:50

Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

– HTNW
Apr 29 at 16:56

@HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

– MSalters
Apr 30 at 9:58

@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

– HTNW
Apr 30 at 12:24

add a comment |

Good Luck

You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.

Example:

Assume that each point hashes to its own unique hash value.

As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.

For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.

Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

There are a few cases where this will not hold true:

Positive/Negative Infinity

A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

perhaps a few other format specific instances

However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.

Outcome

Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).

Or the epsilon is such that only exact matches are permitted.

Granular

You could of course go for a granular approach instead.

Under this approach you define exact buckets down to a particular resolution. ie:

[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...

Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.

Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.

answered Apr 29 at 2:13

Kain0_0

4,832420

2

I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
Apr 29 at 6:35

4

@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
Apr 29 at 8:59

2

The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

– Doc Brown
Apr 29 at 15:26

While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

– Carsten S
Apr 30 at 9:43

1

@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

– Kain0_0
Apr 30 at 23:50

add a comment |

You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.

Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.

BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!

#Pseudo code
class Temperature:
 def __init__(self, degrees):
 #CHECK INVALID VALUES HERE
 #TRANSFORM TO KELVIN HERE
 self.degrees = Math.floor(kelvin/EPSILON)

edited Apr 30 at 9:45

Glorfindel

2,39541727

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

add a comment |

Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:

Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.

Store each item within the hash table using keys that are above and below the value being sought.

Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.

answered Apr 29 at 15:50

supercat

7,1661727

add a comment |

You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.

There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.

answered Apr 29 at 14:00

gnasher729

21.1k22762

add a comment |

This isn't an answer, but an extended comment that may be helpful.

I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.

I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2 will yield 2 instead of 2.00000001 or similar. Of course, this will introduce additional complications that may or may not be worth it.

answered Apr 29 at 14:45

BurnsBA

1011

add a comment |

protected by gnat Apr 30 at 5:15

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,

While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.

So I strongly recommend you don't do that.

(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

2

interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
Apr 29 at 6:51

@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

– Ordous
Apr 29 at 14:50

Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

– HTNW
Apr 29 at 16:56

@HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

– MSalters
Apr 30 at 9:58

@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

– HTNW
Apr 30 at 12:24

add a comment |

implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,

While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.

So I strongly recommend you don't do that.

(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

2

interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
Apr 29 at 6:51

@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

– Ordous
Apr 29 at 14:50

Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

– HTNW
Apr 29 at 16:56

@HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

– MSalters
Apr 30 at 9:58

@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

– HTNW
Apr 30 at 12:24

add a comment |

implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,

While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.

So I strongly recommend you don't do that.

(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,

While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.

So I strongly recommend you don't do that.

(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

answered Apr 29 at 6:30

Sebastian Redl

11.7k63842

2

interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
Apr 29 at 6:51

@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

– Ordous
Apr 29 at 14:50

Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

– HTNW
Apr 29 at 16:56

@HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

– MSalters
Apr 30 at 9:58

@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

– HTNW
Apr 30 at 12:24

add a comment |

2

interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
Apr 29 at 6:51

@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

– Ordous
Apr 29 at 14:50

Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

– HTNW
Apr 29 at 16:56

@HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

– MSalters
Apr 30 at 9:58

@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

– HTNW
Apr 30 at 12:24

interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

– Christophe
Apr 29 at 6:51

@Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

– Ordous
Apr 29 at 14:50

Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

– HTNW
Apr 29 at 16:56

@HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

– MSalters
Apr 30 at 9:58

@MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

– HTNW
Apr 30 at 12:24

add a comment |

Good Luck

You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.

Example:

Assume that each point hashes to its own unique hash value.

For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.

Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

There are a few cases where this will not hold true:

Positive/Negative Infinity

A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

perhaps a few other format specific instances

However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.

Outcome

Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).

Or the epsilon is such that only exact matches are permitted.

Granular

You could of course go for a granular approach instead.

Under this approach you define exact buckets down to a particular resolution. ie:

[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...

Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.

Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.

answered Apr 29 at 2:13

Kain0_0

4,832420

2

I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
Apr 29 at 6:35

4

@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
Apr 29 at 8:59

2

The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

– Doc Brown
Apr 29 at 15:26

While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

– Carsten S
Apr 30 at 9:43

1

@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

– Kain0_0
Apr 30 at 23:50

add a comment |

Good Luck

You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.

Example:

Assume that each point hashes to its own unique hash value.

For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.

Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

There are a few cases where this will not hold true:

Positive/Negative Infinity

A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

perhaps a few other format specific instances

However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.

Outcome

Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).

Or the epsilon is such that only exact matches are permitted.

Granular

You could of course go for a granular approach instead.

Under this approach you define exact buckets down to a particular resolution. ie:

[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...

Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.

Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.

answered Apr 29 at 2:13

Kain0_0

4,832420

2

I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
Apr 29 at 6:35

4

@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
Apr 29 at 8:59

2

The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

– Doc Brown
Apr 29 at 15:26

While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

– Carsten S
Apr 30 at 9:43

1

@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

– Kain0_0
Apr 30 at 23:50

add a comment |

Good Luck

You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.

Example:

Assume that each point hashes to its own unique hash value.

For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.

Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

There are a few cases where this will not hold true:

Positive/Negative Infinity

A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

perhaps a few other format specific instances

However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.

Outcome

Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).

Or the epsilon is such that only exact matches are permitted.

Granular

You could of course go for a granular approach instead.

Under this approach you define exact buckets down to a particular resolution. ie:

[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...

Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.

Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.

answered Apr 29 at 2:13

Kain0_0

4,832420

Good Luck

You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.

Example:

Assume that each point hashes to its own unique hash value.

For each two points within epsilon of each other that do not share the same hash value.
- Adjust the hashing scheme so that these two points hash to the same value.

Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

There are a few cases where this will not hold true:

Positive/Negative Infinity

A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

perhaps a few other format specific instances

However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.

Outcome

Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).

Or the epsilon is such that only exact matches are permitted.

Granular

You could of course go for a granular approach instead.

Under this approach you define exact buckets down to a particular resolution. ie:

[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...

Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.

Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.

answered Apr 29 at 2:13

Kain0_0

4,832420

answered Apr 29 at 2:13

Kain0_0

4,832420

answered Apr 29 at 2:13

Kain0_0

4,832420

answered Apr 29 at 2:13

Kain0_0

4,832420

2

I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
Apr 29 at 6:35

4

@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
Apr 29 at 8:59

2

The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

– Doc Brown
Apr 29 at 15:26

While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

– Carsten S
Apr 30 at 9:43

1

@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

– Kain0_0
Apr 30 at 23:50

add a comment |

2

I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
Apr 29 at 6:35

4

@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
Apr 29 at 8:59

2

The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

– Doc Brown
Apr 29 at 15:26

While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

– Carsten S
Apr 30 at 9:43

1

@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

– Kain0_0
Apr 30 at 23:50

I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

– Neil
Apr 29 at 6:35

@DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

– J. Fabian Meier
Apr 29 at 8:59

The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

– Doc Brown
Apr 29 at 15:26

While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

– Carsten S
Apr 30 at 9:43

@CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

– Kain0_0
Apr 30 at 23:50

add a comment |

Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.

BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!

#Pseudo code
class Temperature:
 def __init__(self, degrees):
 #CHECK INVALID VALUES HERE
 #TRANSFORM TO KELVIN HERE
 self.degrees = Math.floor(kelvin/EPSILON)

edited Apr 30 at 9:45

Glorfindel

2,39541727

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

add a comment |

Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.

BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!

#Pseudo code
class Temperature:
 def __init__(self, degrees):
 #CHECK INVALID VALUES HERE
 #TRANSFORM TO KELVIN HERE
 self.degrees = Math.floor(kelvin/EPSILON)

edited Apr 30 at 9:45

Glorfindel

2,39541727

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

add a comment |

Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.

BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!

#Pseudo code
class Temperature:
 def __init__(self, degrees):
 #CHECK INVALID VALUES HERE
 #TRANSFORM TO KELVIN HERE
 self.degrees = Math.floor(kelvin/EPSILON)

edited Apr 30 at 9:45

Glorfindel

2,39541727

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.

BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!

#Pseudo code
class Temperature:
 def __init__(self, degrees):
 #CHECK INVALID VALUES HERE
 #TRANSFORM TO KELVIN HERE
 self.degrees = Math.floor(kelvin/EPSILON)

edited Apr 30 at 9:45

Glorfindel

2,39541727

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

edited Apr 30 at 9:45

Glorfindel

2,39541727

edited Apr 30 at 9:45

Glorfindel

2,39541727

edited Apr 30 at 9:45

Glorfindel

2,39541727

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

answered Apr 29 at 21:55

Alessandro Teruzzi

1994

add a comment |

Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:

Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.

Store each item within the hash table using keys that are above and below the value being sought.

answered Apr 29 at 15:50

supercat

7,1661727

add a comment |

Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:

Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.

Store each item within the hash table using keys that are above and below the value being sought.

answered Apr 29 at 15:50

supercat

7,1661727

add a comment |

Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:

Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.

Store each item within the hash table using keys that are above and below the value being sought.

answered Apr 29 at 15:50

supercat

7,1661727

Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:

Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.

Store each item within the hash table using keys that are above and below the value being sought.

answered Apr 29 at 15:50

supercat

7,1661727

answered Apr 29 at 15:50

supercat

7,1661727

answered Apr 29 at 15:50

supercat

7,1661727

answered Apr 29 at 15:50

supercat

7,1661727

add a comment |

You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.

answered Apr 29 at 14:00

gnasher729

21.1k22762

add a comment |

You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.

answered Apr 29 at 14:00

gnasher729

21.1k22762

add a comment |

You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.

answered Apr 29 at 14:00

gnasher729

21.1k22762

You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.

answered Apr 29 at 14:00

gnasher729

21.1k22762

answered Apr 29 at 14:00

gnasher729

21.1k22762

answered Apr 29 at 14:00

gnasher729

21.1k22762

answered Apr 29 at 14:00

gnasher729

21.1k22762

add a comment |

This isn't an answer, but an extended comment that may be helpful.

answered Apr 29 at 14:45

BurnsBA

1011

add a comment |

This isn't an answer, but an extended comment that may be helpful.

answered Apr 29 at 14:45

BurnsBA

1011

add a comment |

This isn't an answer, but an extended comment that may be helpful.

answered Apr 29 at 14:45

BurnsBA

1011

This isn't an answer, but an extended comment that may be helpful.

answered Apr 29 at 14:45

BurnsBA

1011

answered Apr 29 at 14:45

BurnsBA

1011

answered Apr 29 at 14:45

BurnsBA

1011

answered Apr 29 at 14:45

BurnsBA

1011

add a comment |

protected by gnat Apr 30 at 5:15

This page is only for reference, If you need detailed information, please check here

RgPIsv,9hNzgqpJZ,DgnrP57,r5T3D71o7sZeYl 5yQCrN3KyeQUbpT6E0H1m58oVuSN5FfifkIH3PFA3xVHdsQ3tTvD9x7

搜尋此網誌

Otdfbt

6 Answers
6

protected by gnat Apr 30 at 5:15

6 Answers
6

6 Answers
6

protected by gnat Apr 30 at 5:15

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070

6 Answers 6

protected by gnat Apr 30 at 5:15

6 Answers 6

6 Answers 6

protected by gnat Apr 30 at 5:15

Popular posts from this blog

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O﻿ / ﻿43.24775, -8.60070

6 Answers
6

6 Answers
6

6 Answers
6

Vilaño, A Laracha Índice Patrimonio | Lugares e parroquias | Véxase tamén | Menú de navegación43°14′52″N 8°36′03″O / 43.24775, -8.60070