Lecture 14 - Hashing
Contents
http://cr.yp.to/
http://www.cse.unsw.edu.au/~cs2521/18x1/exams/Algos/
Why
Access closer to O(1) constant time
// for crypto pre-image - Given hash(m), hard to pick m second pre-image - hard to pick a pair of distinct ?? collision resistant - hard to find
// mod 128 … same as & 0b01111111
What
- For a table of size
N
, output range from0
-N-1
- Purely deterministic (same results for
h(k,N)
) - Uniform distribution *
- Cheap to compute
*: i.e. for hashing, if we are hashing by summing characters - lower hash numbers are more probable to have an ‘a’ in it
How
|
|
Hash Collision / Collision Resolution
If two keys hash to the same value, then in our implementation we can allow multiple items in a single location (ie array of item arrays, array of linked lists)
Probing - new indicies
Remake the array, rehash!!! rip
// Best and Worst Case
Given N slots and M items
Best case - all lists have length M/N (uniformly distributed)
Worst case - one list has length M (all the items), other lists are empty
A good hash function, when M <= N - cost O(1) A good hash function, when M > N - cost O(M/N)
The M/N ratio is called the load (alpha)
//
Linear Probing | Open-address hashing - Using the next avalable slot
Access n
- if full then try n+1
, n+2
… 0 … 1… 2…
Successful: 1⁄2 (1+1/(1-alpha)) Unsuccessful: 1⁄2 (1+ 1/(1-alpha)^2)
Deleting is painful.. have to remove and reinsert trailing values
// Double hashing If position hash1(k,N) is full, then insert into hash1(k,N)+hash2(k,N) If hash1(k,N)+hash2(k,N) is full, then try hash1(k,N)+hash2(k,N)+hash2(k,N)
Dynamic Hashtable Resizing .. increase the size.. but then have to rehash things
Performance
gprof (compile programs with -pg)
$> gprof ./prog
lowercase ascii and uppsercase ascii differ by one bit :p (+ 32 0b00100000)
|
|
Hash Table ADT
|
|
|
|