When it comes to fast computing, there are few things as powerful as the humble lookup-table. It comes in many sizes and flavours, implemented on the fly or through some elaborate framework (depending on the task at hand). Sometimes it is pre-calculated and sometimes it is lazily build up during the actual operation. This incarnation is often referred to as a cache and is a bit different than a lookup table. However, the pre-calculated lookup table is often ignored and can often result in significant speed improvements in applications.
A lookup table can be constructed by getting some hash value from the key and then using that numerical value in a straightforward hash table (HashTable or HashMap). The biggest problem with a hash table is collisions. Good keys and hashing strategies can help minimise the collisions, but collisions will always occur. Bucketing and linear probing are then often used to circumvent this problem.
Today I want to discuss a rather impressive technique called "Magics", which is based on De-Bruijn sequences. It is often used in chess engines and is currently responsible for the fastest move generators in existence. It is used by the majority of top chess engines and gives them a real compatitive edge. Fortunately, the same concept can also be applied to different applications. It does tend to be a bit niche, but so is many highly effective algorithms.
Consider the following problem: You have a list of 500 numbers and you need to calculate a very complex operation on each number. We'll use something boring like prime factorization for this discussion. The numbers are all withing the range 0..499. How would you do it? This answer is trivial, of course. A simple array with the number as the index value will give a collision-free lookup technique, something like:
array[0] = 0
array[1] = 1
array[2] = 1
array[3] = 1
array[4] = 2
...
array[400] = 2
array[401] = 1
...
array[499] = 1
To find the number of prime factors for 400, a simple, inexpensive array lookup is required.
Now, consider an arbitrary list of numbers, i.e. you still have 500 fixed numbers, but a number can be any number in the 32-bit number range. For a collision-free implementation, you could use an array of 2^32 locations, but that would be a terrible waste of memory and completely impractical. This is where Magics comes into play...
In essence, the process can be summarised as follows:
1. Given a key (our number), k, multiply this key with a magic number to obtain an index mapping (using 64 bit integers).
2. Right shift the index mapping with 64-n, where n is the size of the lookup table. Smaller n for denser tables, larger n for less-dense. The better the magic number, the smaller n will be.
3. Use this shifted value as the index in the lookup table.
The biggest question of course is where can you get this magic number from? There are techniques to do it precicely, but a simple brute-force technique will suffice in most cases. The basic idea is as follows:
Given a list of 500 numbers to hash collision-free and a value for n (larger n will be easier, smaller n harder), then:
1. while not done
2. genereate a random 64-bit magic number, m.
3. create and initialise to false, a boolean array, array[n].
4. set done to true
5. for every number, n, in the list.
6. index = (n * m) >>> (64 - n)
7. if array[index] is true (collision, try again...)
8. Set done to false
9. break
10. end-if
11. array[index] = true
12. end for
13. end-while
Hopefully some of the numbers, will hash to the same location (for example all the prime numbers could, potentially, hash to the same location as they all have only 1 prime factor). When the magic number is found, it can be used to populate the lookup table with the calculated values. This can be stored in a file for quick retreival on program startup.
As an example, consider a fixed list of 500 numbers of which the number of prime factors are required. The range of the numbers is from 0 to 2,147,483,647. For 500 numbers, it takes about 10 seconds to find a magic number for n=13 (i.e. a lookup table with 8192 locations). Denser tables, with smaller n, should be possible. The population takes a bit of time, but once done, the lookups are instantaneous.
I include a demo program with which to play. Set the BITS and the COUNT at the top to change the lookup table density and the number of numbers to use.
package gj.playbox.magics; | |
import java.util.ArrayList; | |
import java.util.Arrays; | |
import java.util.Date; | |
import java.util.List; | |
import java.util.Random; | |
/** | |
* Magic numbers demo. | |
* | |
* @author Jaco van Niekerk | |
* | |
*/ | |
public class PrimeFactorsMagics { | |
// The range of the numbers. 0..MAX | |
final static long MAX = Integer.MAX_VALUE; | |
// The density of the lookup table (up to 64). Smaller values will take | |
// longer to calculate (or forever). Memory requirements 4 * 2^13 bytes. | |
final static int BITS = 13; | |
// The number of numbers to hash. Complexity increases with more numbers. | |
final static int COUNT = 500; | |
public PrimeFactorsMagics() { | |
} | |
// Create arbitrary long values | |
public List<Long> createLongs() { | |
List<Long> longs = new ArrayList<>(); | |
for (int i = 0; i < COUNT; i++) { | |
Long value = (long) (Math.random() * MAX); | |
while (longs.contains(value)) value = (long) (Math.random() * MAX); | |
longs.add(value); | |
} | |
return longs; | |
} | |
// This is the expensive operation and thing we want. Given the original | |
// long as the key, this constitutes the value. | |
public int factorizationCount(long value) { | |
int count = 0; | |
// check for 2 | |
if (value % 2 == 0) { | |
count++; | |
} | |
while (value % 2 == 0) { | |
value /= 2; | |
} | |
long quotient = 3; | |
while (quotient <= value) { | |
if (value % quotient == 0) { | |
count++; | |
} | |
while (value % quotient == 0) { | |
value /= quotient; | |
} | |
quotient+=2; | |
} | |
return count; | |
} | |
// Give a list of longs, this searches for a compatible 64-bit magic number. | |
public long searchForMagicNumber(List<Long> longs) { | |
boolean found = false; | |
Random random = new Random(); | |
long candidateMagic = 0; | |
int best = 0; | |
while (!found) { | |
candidateMagic = random.nextLong(); | |
boolean[] hash = new boolean[1 << BITS]; | |
Arrays.fill(hash, false); // empty | |
found = true; | |
int count = 0; | |
for (int i = 0; i < longs.size(); i++) { | |
long num = longs.get(i); | |
int index = (int) ((num * candidateMagic) >>> (64 - BITS)); | |
if (!hash[index]) { | |
hash[index] = true; | |
count++; | |
} else { | |
if (count > best) { | |
best = count; | |
System.out.println(new Date() + " " + count + " / " + longs.size()); | |
} | |
found = false; | |
break; | |
} | |
} | |
} | |
return candidateMagic; | |
} | |
// After the magic number has been found, this construct the lookup-table | |
// and also verifies that no collisions are taking place. It can be a bit | |
// slow. | |
public int[] populate(List<Long> longs, long magic) { | |
int[] hash = new int[1 << BITS]; | |
Arrays.fill(hash, -1); | |
int i = 1; | |
long timer = System.currentTimeMillis(); | |
for (long num : longs) { | |
System.out.print((i++)+ "/" + longs.size() + " populating " + num + "..."); | |
int fact = factorizationCount(num); | |
System.out.println(fact); | |
int index = (int) ((num * magic) >>> (64 - BITS)); | |
if (hash[index] != -1) { | |
throw new RuntimeException("Magic number causing collisions!"); | |
} | |
hash[index] = fact; | |
} | |
System.out.println("Populating took: " + (System.currentTimeMillis() - timer) + "ms"); | |
return hash; | |
} | |
// This demonstrates the speed of the lookup table. | |
public void demo(List<Long> longs, int[] hash, long magic) { | |
long timer = System.currentTimeMillis(); | |
for (long num : longs) { | |
int valueOnLookup = hash[(int) ((num * magic) >>> (64 - BITS))]; | |
System.out.println(num + " has " + valueOnLookup + " prime factors."); | |
} | |
System.out.println("Lookup table took: " + (System.currentTimeMillis() - timer) + "ms"); | |
} | |
/** | |
* @param args | |
*/ | |
public static void main(String[] args) { | |
PrimeFactorsMagics magics = new PrimeFactorsMagics(); | |
List<Long> myNumbers = magics.createLongs(); | |
System.out.println("Searching for magic..."); | |
long magicNumber = magics.searchForMagicNumber(myNumbers); | |
System.out.println("Magic found=" + magicNumber); | |
System.out.println("Populating lookup table..."); | |
int[] hash = magics.populate(myNumbers, magicNumber); | |
System.out.println("Demonstrating..."); | |
magics.demo(myNumbers, hash, magicNumber); | |
} | |
} |