How to implement float hashing with approximate equalityHow to implement Continuous Delivery with Java webapp?How to test if a hashing algorithm is good?How Lua handles both integer and float numbers?How do crackers determine number of iterations of a Hashing algorithm?How to ensure objects unique by equality?How to convert byte-array (4 bytes) back into float?Should `Vector<float>.Equals` be reflexive or should it follow IEEE 754 semantics?Avoiding Division by Zero Using Float ComparisonHow does Pearson hashing compare with other non-cryptographic hashing algorithms?Does comparing equality of float numbers mislead junior developers even if no rounding error occurs in my case?

Is this strange Morse signal type common?

How can I test a shell script in a "safe environment" to avoid harm to my computer?

As a small race with a heavy weapon, does enlage remove the disadvantage?

Why doesn't increasing the temperature of something like wood or paper set them on fire?

Is it a good idea to copy a trader when investing?

I want to write a blog post building upon someone else's paper, how can I properly cite/credit them?

Is there an application which does HTTP PUT?

Align a table column at a specific symbol

Why is the episode called "The Last of the Starks"?

Gift for mentor after his thesis defense?

Why did Ham the Chimp push levers?

How to append code verbatim to .bashrc?

GLM: Modelling proportional data - account for variation in total sample size

How long can fsck take on a 30 TB volume?

Expl3 and recent xparse on overleaf: No expl3 loader detected

The unknown and unexplained in science fiction

Is it safe to keep the GPU on 100% utilization for a very long time?

I'm attempting to understand my 401k match and how much I need to contribute to maximize the match

What is the Ancient One's mistake?

Should one save up to purchase a house/condo or maximize their 401(k) first?

Can I bring back Planetary Romance as a genre?

Whose birthyears are canonically established in the MCU?

Why is it wrong to *implement* myself a known, published, widely believed to be secure crypto algorithm?

How do I minimise waste on a flight?



How to implement float hashing with approximate equality


How to implement Continuous Delivery with Java webapp?How to test if a hashing algorithm is good?How Lua handles both integer and float numbers?How do crackers determine number of iterations of a Hashing algorithm?How to ensure objects unique by equality?How to convert byte-array (4 bytes) back into float?Should `Vector<float>.Equals` be reflexive or should it follow IEEE 754 semantics?Avoiding Division by Zero Using Float ComparisonHow does Pearson hashing compare with other non-cryptographic hashing algorithms?Does comparing equality of float numbers mislead junior developers even if no rounding error occurs in my case?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








15















Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that



  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?










share|improve this question



















  • 8





    why is the question has Java's tag?

    – Laiv
    Apr 29 at 7:53







  • 8





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    Apr 29 at 9:04






  • 6





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    Apr 29 at 11:12







  • 4





    Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

    – Solomon Ucko
    Apr 29 at 12:01






  • 5





    Python has more-or-less excellent fixed-point support, maybe that’s something for you.

    – Jonas Schäfer
    Apr 29 at 14:07

















15















Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that



  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?










share|improve this question



















  • 8





    why is the question has Java's tag?

    – Laiv
    Apr 29 at 7:53







  • 8





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    Apr 29 at 9:04






  • 6





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    Apr 29 at 11:12







  • 4





    Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

    – Solomon Ucko
    Apr 29 at 12:01






  • 5





    Python has more-or-less excellent fixed-point support, maybe that’s something for you.

    – Jonas Schäfer
    Apr 29 at 14:07













15












15








15


5






Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that



  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?










share|improve this question
















Let's say we have the following Python class (the problem exists in Java just the same with equals and hashCode)



class Temperature:
def __init__(self, degrees):
self.degrees = degrees


where degrees is the temperature in Kelvin as a float. Now, I would like to implement equality testing and hashing for Temperature in a way that



  • compares floats up to an epsilon difference instead of direct equality testing,

  • and honors the contract that a == b implies hash(a) == hash(b).

def __eq__(self, other):
return abs(self.degrees - other.degrees) < EPSILON

def __hash__(self):
return # What goes here?


The Python documentation talks a bit about hashing numbers to ensure that hash(2) == hash(2.0) but this is not quite the same problem.



Am I even on the right track? And if so, what is the standard way to implement hashing in this situation?



Update: Now I understand that this type of equality testing for floats eliminates the transitivity of == and equals. But how does that go together with the "common knowledge" that floats should not be compared directly? If you implement an equality operator by comparing floats, static analysis tools will complain. Are they right to do so?







java python hashing floating-point






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 30 at 9:43









Glorfindel

2,39541727




2,39541727










asked Apr 28 at 23:41









CQQLCQQL

17916




17916







  • 8





    why is the question has Java's tag?

    – Laiv
    Apr 29 at 7:53







  • 8





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    Apr 29 at 9:04






  • 6





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    Apr 29 at 11:12







  • 4





    Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

    – Solomon Ucko
    Apr 29 at 12:01






  • 5





    Python has more-or-less excellent fixed-point support, maybe that’s something for you.

    – Jonas Schäfer
    Apr 29 at 14:07












  • 8





    why is the question has Java's tag?

    – Laiv
    Apr 29 at 7:53







  • 8





    About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

    – J. Fabian Meier
    Apr 29 at 9:04






  • 6





    @Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

    – Matthieu M.
    Apr 29 at 11:12







  • 4





    Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

    – Solomon Ucko
    Apr 29 at 12:01






  • 5





    Python has more-or-less excellent fixed-point support, maybe that’s something for you.

    – Jonas Schäfer
    Apr 29 at 14:07







8




8





why is the question has Java's tag?

– Laiv
Apr 29 at 7:53






why is the question has Java's tag?

– Laiv
Apr 29 at 7:53





8




8





About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04





About your update: I would say that hashing floats is generally a questionable thing. Try to avoid using floats as keys or as set elements.

– J. Fabian Meier
Apr 29 at 9:04




6




6





@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12






@Neil: At the same time, doesn't rounding sound like integers? By that I mean: if you can round to, say, thousandths of degrees, then you could simply used a fixed-point representation -- an integer expressing the temperature in thousandths of degrees. For ease of use, you could have a getter/setter transparently converting from/to floats if you wish to...

– Matthieu M.
Apr 29 at 11:12





4




4





Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01





Kelvins are no longer degrees. Degrees are also ambiguous. Why not just call it kelvin?

– Solomon Ucko
Apr 29 at 12:01




5




5





Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07





Python has more-or-less excellent fixed-point support, maybe that’s something for you.

– Jonas Schäfer
Apr 29 at 14:07










6 Answers
6






active

oldest

votes


















41















implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



So I strongly recommend you don't do that.



Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



(But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






share|improve this answer


















  • 2





    interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

    – Christophe
    Apr 29 at 6:51











  • @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

    – Ordous
    Apr 29 at 14:50











  • Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

    – HTNW
    Apr 29 at 16:56











  • @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

    – MSalters
    Apr 30 at 9:58











  • @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

    – HTNW
    Apr 30 at 12:24


















16














Good Luck



You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



Example:



Assume that each point hashes to its own unique hash value.



As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.




  1. For each two points within epsilon of each other that do not share the same hash value.



    • Adjust the hashing scheme so that these two points hash to the same value.


  2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

There are a few cases where this will not hold true:



  • Positive/Negative Infinity

  • NaN

  • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

  • perhaps a few other format specific instances

However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



Outcome



Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



Or the epsilon is such that only exact matches are permitted.



Granular



You could of course go for a granular approach instead.



Under this approach you define exact buckets down to a particular resolution. ie:



[0.001, 0.002)
[0.002, 0.003)
[0.003, 0.004)
...
[122.999, 123.000)
...


Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






share|improve this answer


















  • 2





    I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

    – Neil
    Apr 29 at 6:35






  • 4





    @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

    – J. Fabian Meier
    Apr 29 at 8:59






  • 2





    The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

    – Doc Brown
    Apr 29 at 15:26












  • While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

    – Carsten S
    Apr 30 at 9:43






  • 1





    @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

    – Kain0_0
    Apr 30 at 23:50


















7














You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.



Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.



BEWARE
If you change the value of EPSILON and you have serialised the object they will be not compatible!



#Pseudo code
class Temperature:
def __init__(self, degrees):
#CHECK INVALID VALUES HERE
#TRANSFORM TO KELVIN HERE
self.degrees = Math.floor(kelvin/EPSILON)





share|improve this answer
































    1














    Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:



    1. Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.


    2. Store each item within the hash table using keys that are above and below the value being sought.


    Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.






    share|improve this answer






























      0














      You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.



      There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.






      share|improve this answer






























        0














        This isn't an answer, but an extended comment that may be helpful.



        I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.



        I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2 will yield 2 instead of 2.00000001 or similar. Of course, this will introduce additional complications that may or may not be worth it.






        share|improve this answer





















          protected by gnat Apr 30 at 5:15



          Thank you for your interest in this question.
          Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



          Would you like to answer one of these unanswered questions instead?














          6 Answers
          6






          active

          oldest

          votes








          6 Answers
          6






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          41















          implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




          Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



          While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



          So I strongly recommend you don't do that.



          Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



          (But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






          share|improve this answer


















          • 2





            interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

            – Christophe
            Apr 29 at 6:51











          • @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

            – Ordous
            Apr 29 at 14:50











          • Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

            – HTNW
            Apr 29 at 16:56











          • @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

            – MSalters
            Apr 30 at 9:58











          • @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

            – HTNW
            Apr 30 at 12:24















          41















          implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




          Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



          While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



          So I strongly recommend you don't do that.



          Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



          (But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






          share|improve this answer


















          • 2





            interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

            – Christophe
            Apr 29 at 6:51











          • @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

            – Ordous
            Apr 29 at 14:50











          • Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

            – HTNW
            Apr 29 at 16:56











          • @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

            – MSalters
            Apr 30 at 9:58











          • @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

            – HTNW
            Apr 30 at 12:24













          41












          41








          41








          implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




          Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



          While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



          So I strongly recommend you don't do that.



          Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



          (But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)






          share|improve this answer














          implement equality testing and hashing for Temperature in a way that compares floats up to an epsilon difference instead of direct equality testing,




          Fuzzy equality violates the requirements that Java places on the equals method, namely transitivity, i.e. that if x == y and y == z, then x == z. But if you do an fuzzy equality with, for example, an epsilon of 0.1, then 0.1 == 0.2 and 0.2 == 0.3, but 0.1 == 0.3 does not hold.



          While Python does not document such a requirement, still the implications of having a non-transitive equality make it a very bad idea; reasoning about such types is headache-inducing.



          So I strongly recommend you don't do that.



          Either provide exact equality and base your hash on that in the obvious way, and provide a separate method to do the fuzzy matching, or go with the equivalence class approach suggested by Kain. Though in the latter case, I recommend you fix your value to a representative member of the equivalence class in the constructor, and then go with simple exact equality and hashing for the rest; it's much easier to reason about the types this way.



          (But if you do that, you might as well use a fixed point representation instead of floating point, i.e. you use an integer to count thousandths of a degree, or whatever precision you require.)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 29 at 6:30









          Sebastian RedlSebastian Redl

          11.7k63842




          11.7k63842







          • 2





            interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

            – Christophe
            Apr 29 at 6:51











          • @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

            – Ordous
            Apr 29 at 14:50











          • Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

            – HTNW
            Apr 29 at 16:56











          • @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

            – MSalters
            Apr 30 at 9:58











          • @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

            – HTNW
            Apr 30 at 12:24












          • 2





            interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

            – Christophe
            Apr 29 at 6:51











          • @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

            – Ordous
            Apr 29 at 14:50











          • Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

            – HTNW
            Apr 29 at 16:56











          • @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

            – MSalters
            Apr 30 at 9:58











          • @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

            – HTNW
            Apr 30 at 12:24







          2




          2





          interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

          – Christophe
          Apr 29 at 6:51





          interesting thoughts. So by accumulating millions of epsilon and with transitivity you can conclude that anything is equal to anything else :-) But does this mathematic constraint acknowledge the discrete foundation of floating points, which in many cases are approximations of the number they are intended to represent ?

          – Christophe
          Apr 29 at 6:51













          @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

          – Ordous
          Apr 29 at 14:50





          @Christophe Interesting question. If you think about it, you'll see that this approach will make a single large equivalence class out of floats whose resolution is greater than epsilon (it's centered on 0, of course) and leave the other floats in their own class each. But that's not the point, the real problem is that whether it concludes that 2 numbers are equal depends on whether there is a third one compared and the order in which that is done.

          – Ordous
          Apr 29 at 14:50













          Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

          – HTNW
          Apr 29 at 16:56





          Addressing @OP's edit, I would add that the incorrectness of floating-point == should "infect" the == of types containing them. That is, if they follow your advice of providing an exact equality, then their static analysis tool should further be configured to warn when equality is used on Temperature. It's the only thing you can do, really.

          – HTNW
          Apr 29 at 16:56













          @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

          – MSalters
          Apr 30 at 9:58





          @HTNW: That would be too simple. A ratio class might have a float approximation field which does not participate in ==. Besides, the static analysis tool will already give a warning inside the == implementation of classes when one of the members being compared is a float type.

          – MSalters
          Apr 30 at 9:58













          @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

          – HTNW
          Apr 30 at 12:24





          @MSalters ? Presumably, sufficiently configurable static analysis tools can do what I suggested just fine. If a class has a float field that doesn't participate in ==, then don't configure your tool to warn on == on that class. If the class does, then presumably marking the class's == as "too exact" will cause the tool to ignore that sort of error within the implementation. E.g. in Java, if @Deprecated void foo(), then void bar() foo(); is a warning, but @Deprecated void bar() foo(); is not. Maybe many tools don't support this, but some might.

          – HTNW
          Apr 30 at 12:24













          16














          Good Luck



          You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



          Example:



          Assume that each point hashes to its own unique hash value.



          As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.




          1. For each two points within epsilon of each other that do not share the same hash value.



            • Adjust the hashing scheme so that these two points hash to the same value.


          2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

          There are a few cases where this will not hold true:



          • Positive/Negative Infinity

          • NaN

          • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

          • perhaps a few other format specific instances

          However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



          Outcome



          Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



          Or the epsilon is such that only exact matches are permitted.



          Granular



          You could of course go for a granular approach instead.



          Under this approach you define exact buckets down to a particular resolution. ie:



          [0.001, 0.002)
          [0.002, 0.003)
          [0.003, 0.004)
          ...
          [122.999, 123.000)
          ...


          Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



          Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






          share|improve this answer


















          • 2





            I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

            – Neil
            Apr 29 at 6:35






          • 4





            @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

            – J. Fabian Meier
            Apr 29 at 8:59






          • 2





            The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

            – Doc Brown
            Apr 29 at 15:26












          • While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

            – Carsten S
            Apr 30 at 9:43






          • 1





            @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

            – Kain0_0
            Apr 30 at 23:50















          16














          Good Luck



          You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



          Example:



          Assume that each point hashes to its own unique hash value.



          As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.




          1. For each two points within epsilon of each other that do not share the same hash value.



            • Adjust the hashing scheme so that these two points hash to the same value.


          2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

          There are a few cases where this will not hold true:



          • Positive/Negative Infinity

          • NaN

          • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

          • perhaps a few other format specific instances

          However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



          Outcome



          Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



          Or the epsilon is such that only exact matches are permitted.



          Granular



          You could of course go for a granular approach instead.



          Under this approach you define exact buckets down to a particular resolution. ie:



          [0.001, 0.002)
          [0.002, 0.003)
          [0.003, 0.004)
          ...
          [122.999, 123.000)
          ...


          Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



          Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






          share|improve this answer


















          • 2





            I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

            – Neil
            Apr 29 at 6:35






          • 4





            @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

            – J. Fabian Meier
            Apr 29 at 8:59






          • 2





            The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

            – Doc Brown
            Apr 29 at 15:26












          • While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

            – Carsten S
            Apr 30 at 9:43






          • 1





            @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

            – Kain0_0
            Apr 30 at 23:50













          16












          16








          16







          Good Luck



          You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



          Example:



          Assume that each point hashes to its own unique hash value.



          As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.




          1. For each two points within epsilon of each other that do not share the same hash value.



            • Adjust the hashing scheme so that these two points hash to the same value.


          2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

          There are a few cases where this will not hold true:



          • Positive/Negative Infinity

          • NaN

          • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

          • perhaps a few other format specific instances

          However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



          Outcome



          Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



          Or the epsilon is such that only exact matches are permitted.



          Granular



          You could of course go for a granular approach instead.



          Under this approach you define exact buckets down to a particular resolution. ie:



          [0.001, 0.002)
          [0.002, 0.003)
          [0.003, 0.004)
          ...
          [122.999, 123.000)
          ...


          Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



          Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.






          share|improve this answer













          Good Luck



          You are not going to be able to achieve that, without being stupid with hashes, or sacrificing the epsilon.



          Example:



          Assume that each point hashes to its own unique hash value.



          As floating point numbers are sequential there will be up to k numbers prior to a given floating point value, and up to k numbers after a given floating point value which are within some epsilon of the given point.




          1. For each two points within epsilon of each other that do not share the same hash value.



            • Adjust the hashing scheme so that these two points hash to the same value.


          2. Inducting for all such pairs the entire sequence of floating point numbers will collapse toward a single has value.

          There are a few cases where this will not hold true:



          • Positive/Negative Infinity

          • NaN

          • A few De-normalised ranges that may not be linkable to the main range for a given epsilon.

          • perhaps a few other format specific instances

          However >=99% of the floating point range will hash to a single value for any value of epsilon that includes at least one floating point value above or below some given floating point value.



          Outcome



          Either >= 99% entire floating point range hashes to a single value seriously comprimising the intent of a hash value (and any device/container relying on a fairly distributed low-collision hash).



          Or the epsilon is such that only exact matches are permitted.



          Granular



          You could of course go for a granular approach instead.



          Under this approach you define exact buckets down to a particular resolution. ie:



          [0.001, 0.002)
          [0.002, 0.003)
          [0.003, 0.004)
          ...
          [122.999, 123.000)
          ...


          Each bucket has a unique hash, and any floating point within the bucket compares equal to any other float in the same bucket.



          Unfortunately it is still possible for two floats to be epsilon distance away, and have two separate hashes.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 29 at 2:13









          Kain0_0Kain0_0

          4,832420




          4,832420







          • 2





            I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

            – Neil
            Apr 29 at 6:35






          • 4





            @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

            – J. Fabian Meier
            Apr 29 at 8:59






          • 2





            The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

            – Doc Brown
            Apr 29 at 15:26












          • While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

            – Carsten S
            Apr 30 at 9:43






          • 1





            @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

            – Kain0_0
            Apr 30 at 23:50












          • 2





            I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

            – Neil
            Apr 29 at 6:35






          • 4





            @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

            – J. Fabian Meier
            Apr 29 at 8:59






          • 2





            The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

            – Doc Brown
            Apr 29 at 15:26












          • While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

            – Carsten S
            Apr 30 at 9:43






          • 1





            @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

            – Kain0_0
            Apr 30 at 23:50







          2




          2





          I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

          – Neil
          Apr 29 at 6:35





          I agree that the granular approach here would probably be best, if that fits OP's requirements. Though I'm afraid OP has like +/- 0.1% type requirements, meaning it can't be granular.

          – Neil
          Apr 29 at 6:35




          4




          4





          @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

          – J. Fabian Meier
          Apr 29 at 8:59





          @DocBrown The "not possible" part is correct. If epsilon based equality should imply that the hash codes are equal, then you automatically have all hash codes equal, so the hash function is not useful anymore. The buckets approach can be fruitful, but you will have numbers with different hash codes that are arbitrarily close to each other.

          – J. Fabian Meier
          Apr 29 at 8:59




          2




          2





          The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

          – Doc Brown
          Apr 29 at 15:26






          The bucket approach can be modified by checking not only the bucket with the exact hash key, but also the two neighboured buckets (or at least one of them) for their content as well. That elimininates the problem of those edge cases for the cost of increasing the running time by a factor of at most two (when implemented correctly). However, it does not change the general running time order.

          – Doc Brown
          Apr 29 at 15:26














          While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

          – Carsten S
          Apr 30 at 9:43





          While you are right in spirit, not everything will collapse. With a fixed small epsilon, most numbers will only equal themselves. Of course, for those the epsilon will be useless, so again, in spirit you are correct.

          – Carsten S
          Apr 30 at 9:43




          1




          1





          @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

          – Kain0_0
          Apr 30 at 23:50





          @CarstenS Yes, my statement that 99% of the range hashes to a single hash does not actually cover the whole float range. There are many high range values who are separated by more than epsilon that will hash to their own unique buckets.

          – Kain0_0
          Apr 30 at 23:50











          7














          You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.



          Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.



          BEWARE
          If you change the value of EPSILON and you have serialised the object they will be not compatible!



          #Pseudo code
          class Temperature:
          def __init__(self, degrees):
          #CHECK INVALID VALUES HERE
          #TRANSFORM TO KELVIN HERE
          self.degrees = Math.floor(kelvin/EPSILON)





          share|improve this answer





























            7














            You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.



            Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.



            BEWARE
            If you change the value of EPSILON and you have serialised the object they will be not compatible!



            #Pseudo code
            class Temperature:
            def __init__(self, degrees):
            #CHECK INVALID VALUES HERE
            #TRANSFORM TO KELVIN HERE
            self.degrees = Math.floor(kelvin/EPSILON)





            share|improve this answer



























              7












              7








              7







              You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.



              Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.



              BEWARE
              If you change the value of EPSILON and you have serialised the object they will be not compatible!



              #Pseudo code
              class Temperature:
              def __init__(self, degrees):
              #CHECK INVALID VALUES HERE
              #TRANSFORM TO KELVIN HERE
              self.degrees = Math.floor(kelvin/EPSILON)





              share|improve this answer















              You can model your temperature as an integer under the hood. Temperature has a natural lower bound (-273.15 Celsius). So, double (-273.15 is equal to 0 for your underlying integer). The second element that you need is the granularity of your mapping. You are already using this granularity implicitly; it is your EPSILON.



              Just divide your temperature by EPSILON and take the floor of it, now your hash and your equal will behave in sync. In Python 3 the integer is unbounded, EPSILON can be smaller if you like.



              BEWARE
              If you change the value of EPSILON and you have serialised the object they will be not compatible!



              #Pseudo code
              class Temperature:
              def __init__(self, degrees):
              #CHECK INVALID VALUES HERE
              #TRANSFORM TO KELVIN HERE
              self.degrees = Math.floor(kelvin/EPSILON)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Apr 30 at 9:45









              Glorfindel

              2,39541727




              2,39541727










              answered Apr 29 at 21:55









              Alessandro TeruzziAlessandro Teruzzi

              1994




              1994





















                  1














                  Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:



                  1. Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.


                  2. Store each item within the hash table using keys that are above and below the value being sought.


                  Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.






                  share|improve this answer



























                    1














                    Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:



                    1. Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.


                    2. Store each item within the hash table using keys that are above and below the value being sought.


                    Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.






                    share|improve this answer

























                      1












                      1








                      1







                      Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:



                      1. Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.


                      2. Store each item within the hash table using keys that are above and below the value being sought.


                      Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.






                      share|improve this answer













                      Implementing a floating-point hash table that can find things that are "approximately equal" to a given key will require using a couple of approaches or a combination thereof:



                      1. Round each value to an increment which is somewhat larger than the "fuzzy" range before storing it in the hash table, and when trying to find a value, check the hash table for the rounded values above and below the value sought.


                      2. Store each item within the hash table using keys that are above and below the value being sought.


                      Note that using either approach will likely require that hash table entries not identify items, but rather lists, since there will likely be multiple items associated with each key. The first approach above will minimize the required hash table size, but each search for an item not in the table will require two hash-table lookups. The second approach will quickly be able to identify that items aren't in the table, but will generally require the table to hold about twice as many entries as would otherwise be required. If one is trying to find objects in 2D space, it may be useful to use one approach for the X direction and one for the Y direction, so that instead of having each item stored once but requiring four query operations for each lookup, or being able to use one lookup to find an item but having to store each item four times, one would store each item twice and use two lookup operations to find it.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Apr 29 at 15:50









                      supercatsupercat

                      7,1661727




                      7,1661727





















                          0














                          You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.



                          There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.






                          share|improve this answer



























                            0














                            You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.



                            There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.






                            share|improve this answer

























                              0












                              0








                              0







                              You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.



                              There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.






                              share|improve this answer













                              You can of course define “almost equal” by deleting say the last eight bits of the mantissa and then comparing or hashing. The problem is that numbers very close to each other may be different.



                              There is some confusion here: if two floating point numbers compare equal, they are equal. To check if they are equal, you use “==“. Sometimes you don’t want to check for equality, but when you do, “==“ is the way to go.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Apr 29 at 14:00









                              gnasher729gnasher729

                              21.1k22762




                              21.1k22762





















                                  0














                                  This isn't an answer, but an extended comment that may be helpful.



                                  I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.



                                  I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2 will yield 2 instead of 2.00000001 or similar. Of course, this will introduce additional complications that may or may not be worth it.






                                  share|improve this answer



























                                    0














                                    This isn't an answer, but an extended comment that may be helpful.



                                    I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.



                                    I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2 will yield 2 instead of 2.00000001 or similar. Of course, this will introduce additional complications that may or may not be worth it.






                                    share|improve this answer

























                                      0












                                      0








                                      0







                                      This isn't an answer, but an extended comment that may be helpful.



                                      I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.



                                      I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2 will yield 2 instead of 2.00000001 or similar. Of course, this will introduce additional complications that may or may not be worth it.






                                      share|improve this answer













                                      This isn't an answer, but an extended comment that may be helpful.



                                      I have been working on a similar problem, while using MPFR (based on GNU MP). The "bucket" approach as outlined by @Kain0_0 seems to give acceptable results, but be aware of the limitations highlighted in that answer.



                                      I wanted to add that -- depending on what you are trying to do -- using an "exact" (caveat emptor) computer algebra system like Mathematica may help supplement or verify an inexact numerical program. This will allow you to compute results without worrying about rounding, for example, 7*√2 - 5*√2 will yield 2 instead of 2.00000001 or similar. Of course, this will introduce additional complications that may or may not be worth it.







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Apr 29 at 14:45









                                      BurnsBABurnsBA

                                      1011




                                      1011















                                          protected by gnat Apr 30 at 5:15



                                          Thank you for your interest in this question.
                                          Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                          Would you like to answer one of these unanswered questions instead?



                                          Popular posts from this blog

                                          Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

                                          Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

                                          What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company