TDSM 12.21

From The Data Science Design Manual Wikia
Jump to: navigation, search

Hashtable collision: When more than one value has the same hash value, there is a collision as the value can't be stored in it's original place because there's already a value there.

To avoid hash map collision, there are several techniques, one of which is linear probing, in which if a value can't be stored at it's index, then store it in the next nearest empty slot.

The frequency of hashmap collision depends on the size of the hashmap and the distribution of data. For a data which has a lot of similar content, it'll have more collisions. Also, small size of the hashmap will also result in more collisions.

---

By using universal hashing or perfect hashing, the expected time for insertion and lookup will be O(1). Also, both universal hashing and perfect hashing do not assume the distribution of the input data. I just learned this in CSE 548 Analysis of Algorithms this semester.