Monday, November 4, 2013

Magics

Hopefully the whale has not been beached... let's provide some semi-motivational content.

When it comes to fast computing, there are few things as powerful as the humble lookup-table. It comes in many sizes and flavours, implemented on the fly or through some elaborate framework (depending on the task at hand).  Sometimes it is pre-calculated and sometimes it is lazily build up during the actual operation.  This incarnation is often referred to as a cache and is a bit different than a lookup table.  However, the pre-calculated lookup table is often ignored and can often result in significant speed improvements in applications.

A lookup table can be constructed by getting some hash value from the key and then using that numerical value in a straightforward hash table (HashTable or HashMap).  The biggest problem with a hash table is collisions. Good keys and hashing strategies can help minimise the collisions, but collisions will always occur.  Bucketing and linear probing are then often used to circumvent this problem.

Today I want to discuss a rather impressive technique called "Magics", which is based on De-Bruijn sequences. It is often used in chess engines and is currently responsible for the fastest move generators in existence.  It is used by the majority of top chess engines and gives them a real compatitive edge.  Fortunately, the same concept can also be applied to different applications. It does tend to be a bit niche, but so is many highly effective algorithms.

Consider the following problem:  You have a list of 500 numbers and you need to calculate a very complex operation on each number.  We'll use something boring like prime factorization for this discussion.  The numbers are all withing the range 0..499.  How would you do it?  This answer is trivial, of course.  A simple array with the number as the index value will give a collision-free lookup technique, something like:

array[0] = 0
array[1] = 1
array[2] = 1
array[3] = 1
array[4] = 2
...
array[400] = 2
array[401] = 1
...
array[499] = 1

To find the number of prime factors for 400, a simple, inexpensive array lookup is required.

Now, consider an arbitrary list of numbers, i.e. you still have 500 fixed numbers, but a number can be any number in the 32-bit number range. For a collision-free implementation, you could use an array of 2^32 locations, but that would be a terrible waste of memory and completely impractical. This is where Magics comes into play...

In essence, the process can be summarised as follows:

1.  Given a key (our number), k, multiply this key with a magic number to obtain an index mapping (using 64 bit integers). 

2. Right shift the index mapping with 64-n, where n is the size of the lookup table.  Smaller n for denser tables, larger n for less-dense.  The better the magic number, the smaller n will be.

3.  Use this shifted value as the index in the lookup table.

The biggest question of course is where can you get this magic number from?  There are techniques to do it precicely, but a simple brute-force technique will suffice in most cases.  The basic idea is as follows:

Given a list of 500 numbers to hash collision-free and a value for n  (larger n will be easier, smaller n harder), then:

1.  while not done
2.    genereate a random 64-bit magic number, m.
3.    create and initialise to false, a boolean array, array[n].
4.    set done to true
5.    for every number, n, in the list.
6.      index = (n * m) >>> (64 - n)
7.      if array[index] is true (collision, try again...)
8.        Set done to false
9.        break
10.     end-if
11.     array[index] = true
12.   end for
13.  end-while    

Hopefully some of the numbers, will hash to the same location (for example all the prime numbers could, potentially, hash to the same location as they all have only 1 prime factor). When the magic number is found, it can be used to populate the lookup table with the calculated values.  This can be stored in a file for quick retreival on program startup.

As an example, consider a fixed list of 500 numbers of which the number of prime factors are required.  The range of the numbers is from 0 to 2,147,483,647. For 500 numbers, it takes about 10 seconds to find a magic number for n=13 (i.e. a lookup table with 8192 locations).  Denser tables, with smaller n, should be possible.  The population takes a bit of time, but once done, the lookups are instantaneous.

I include a demo program with which to play.  Set the BITS and the COUNT at the top to change the lookup table density and the number of numbers to use.

Wednesday, April 17, 2013

Haxe - One language, everywhere?

This is just a quick post to inform you guys of Haxe. A new multiplatform programming language claiming it
"can be compiled to all popular programming platforms with its fast compiler – JavaScript, Flash, NekoVM, PHP, C++, C# and Java (soon) – which means your apps will support all popular mobile devices, such as iOS, Android, Windows Mobile, webOS and more."
Here are some quick links to its Features and Documentation. Post your thoughts in the comments section.

Thursday, January 24, 2013

An Unfortunate Conversation

This is a brief extract of a conversation I had with myself today.  My sincere wish is that you'll never have this kind of conversation with yourself.  I was playing around with ActiveMQ's persistence (see http://activemq.apache.org) after getting basic queuing working...

Me: "How do I get persistence to work properly?" (I start googling and get some pointers...)
Me: "Oh, this is easy.." (sounds of typing)

    KahaPersistenceAdaptor adapter = new KahaPersistenceAdaptor();

Me: "Hmmm... ah, just dump the database in my home directory   for now..."

    adaptor.setDirectory("~");
    ...

Me: "Ok, now I just need to set it up... cool."
(Some more typing and thoughts of leaving early...) 

Me: "Okay, let's see what happens..." (clicks run).
Me: "Cool! Yes! It works... let's see how the database looks."

    $ cd ~
    $ ls -la

    <Home directory listing here, without the database file(s) I expected...>

Me: "That's funny... what does the project directory look like?"

    $ cd ~/workspace/activemq-test
    $ ls -la

   total 12
   drw-r--r-- 1 jaco 4096 Jan 24 06:58 ~
   -rw-r--r-- 1 jaco 1924 Jan 24 06:58 pom.xml
   <some other files>

Me: "Hmmm... What the hell? What is that ~ doing there..."

    $ rm ~
    rm: cannot remove `~': It is a directory

Me: "Ag... dammit... just delete the damn thing!"
(At this point reason and intellect was replaced by the primordial aggressive hunter/gatherer)

    $ rm -rf ~

(Reason and intellect reasserts itself in my recently installed cerebral cortex...albeit, just briefly.)
Me: "...."

   $ cd ~
   $ ls -la

   total 0

(I felt the hair in my neck stand on-end, that eery sensation and that metal taste... At this point I experienced a complete take-over of the medulla & cerebellum part of the brain, rendering me completely and utterly speechless...)

Me: "..."
Me: "..."
(sounds of soft crying...)

After a few hours of trying to get my system in a semi-working state (and realising that the program was never checked in anywhere), I vow never to use rm again.

Tuesday, January 22, 2013

Juniper Network Connect VPN client

In addition to using the Cisco VPN client at work we also need to make use of the Juniper Network Connect (JNC) VPN client at times. After struggling to get it working and reading a lot of blog posts, including this one which is important for getting hold of JNC via a browser, I discovered the MadScientist's blog post. Fortunately I discovered that I don't really need his msjnc perl script even though it has helped me a lot. It is however required for proper initialisation/setup of JNC. This post will explain what I have learnt and hopefully it will also help someone else (if not, then I have at least documented it for myself).

After following the blog post I was still unable to get it working, because:

  1. My computer is behind a proxy through which I am supposed to access the VPN. Unfortunately the blog post does not go into any details regarding proxies.
  2. When setting up a profile using the  msjnc front-end it tries to download an X.509 certificate using the ~/.juniper_networks/getx509certificate.sh script which tries to execute

    openssl s_client -connect $1:443 < /dev/null 1>out.txt 2>err.txt

    This failed for me, because I am behind a proxy and openssl does not support proxies. The workaround for this was to connect directly to the Internet (using a different network) and then configure a new profile from scratch. Once the certificate is downloaded it can be reused - even when behind a proxy as will be seen later. The certificate is downloaded to ~/.juniper_networks/.cert.<vpn.host>
  3. I believe there are a few bugs in the msjnc front-end when configuring the VPN proxy server hostname:port pair. Whenever I entered something like 10.0.0.10:8080 it complained that it was not valid. This results in incorrect settings in the ~/.msjnc-profiles.cfg file. I should probably fork the GibHub project and send a patch instead of just complaining :-)

After fidgeting a lot I noticed the command that msjnc tries to execute in the ~/.msjnc.log file. I copied+pasted this into a terminal and voila!

When connected directly to the Internet the following command works for me:

~/.juniper_networks/network_connect/ncsvc -h <vpn.host> -u <vpn.username> -r <realm> -P 443 -U https://<vpn.host> -f ~/.juniper_networks/.cert.<vpn.host>

where:

  • vpn.host is the hostname or IP of the VPN server
  • vpn.username is your username as provided by your "network administrator"
  • realm is the realm of the VPN you are connecting to (see the introductory comments of the msjnc script on how to determine your realm)

When behind a proxy the following command works for me:

~/.juniper_networks/network_connect/ncsvc -h <vpn.host> -u <vpn.username> -r <realm> -P 443 -U https://<vpn.host> -y <proxy.host> -z <proxy.port> -d <proxy.domain> -s <proxy.username> -a <proxy.password> -f ~/.juniper_networks/.cert.<vpn.host>

where:

  • the above arguments are still valid; and
  • proxy.host is your proxy hostname or IP
  • proxy.port is your proxy port
  • proxy.domain is the Active Directory DOMAIN (I think), such as WORKGROUP
  • proxy.username is your proxy username
  • proxy.password is your proxy password

What I like about this solution is that once it is setup you do not need to jump through all the Java hoops or even visit your VPN's web site. You can just run the command, type in your password when prompted and then you are connected to the Juniper VPN. You also don't need the msjnc script any more, but I'll keep it around just in case.

Tuesday, January 15, 2013

Celebrating our Machines

I want us to think outside computers for a while, and think of the incredible machines that are all around us, but that we don't always appreciate. Machines are awesome, and they should all be celebrated.

Let's ponder our motor vehicles for a while.

And specifically, my old car. The Golf IV that was around for 12+ years, and 310,000 km.

This post is sort of a tribute to that machine. A machine that was close to my heart, even though I couldn't chop and change it, or modify it, like I can with computers. I just drove it. And that's probably better - because I know almost nothing about the mechanical workings of a motor vehicle, and if I were to tinker it probably wouldn't have lasted :)

Lets look at a couple of stats:

  • Drove for 12 years
  • +- 310,000 km
  • Was driven basically every day in its lifetime, except for 4 or 5 holidays here and there
  • Comfortably reached 120km/h

Let's dig into those numbers:

310,000 km
  • This is about 7.75 times around the equator.
  • This is 80% of the distance to the moon (my biggest regret is that the car didn't make it to the moon. It probably could have.)
  • This is about 15 times from Cape Town to Cairo, and back
  • It's about 30 times around the moon at the moon's equator
  • It's about 15 times around Mars' equator
  • It's 3 times from one side of Jupiter's Red Spot, and back
12 years
  • Assuming 6 two-way trips for every week in the 12 years, this means the driver door was opened and closed around 15,000 times. (12 * 52 * 6 * 2 * 2)
  • Assuming a lifetime average speed of 40km/h, the engine was on and the car running for around 7,800 hours (325 days). An average of 30km/h is 10,300 hours (430 days) (310,000 / 40)
  • Assuming an average RPM of around 2,500 for the 7,800 hours, the total revolutions of the engine come to 1,170,000,000 (2,500 * 60 * 7,800)
  • How many sparks does it take to get a 4-stroke engine to do the above # of revs? I think it'll be really interesting to work this out
  • Assuming a lifetime average fuel consumption of 8l/100km, this gives around 25,000l of fuel. You work out how much that would cost at current fuel prices.
120 km/h
  • Ignoring walking, for the majority of mankind's existence, our main transportation were horses. Which walk at an average of 7km/h. Humans being able to travel at 120km/h in their own personal vehicle would seem like magic to people from those days

I sometimes feel as though I am too electronics-orientated. Which is understandable considering my career and life-long interest in computers - but I reckon it's important not to forget all the amazing machines around us that we sometimes take for granted in our electronics-focused lives. Like our cars.

What are some other incredible machines in our daily lives that we may not always recognise? What statistics can we think of for these machines to put what they do into perspective?

Monday, January 7, 2013

Oh Windows...

After switching to Linux a couple of years ago (thanks to the every-so-friendly encouragement of the co-authors of this blog), I had more and more difficulty with Windows.  It is as if Windows' hardware auto-detect is so advanced it detects the Linux presence in your wetware processor and enables the  pain-generator-module to bring some joy to your day.

Having moved to a new employer, the first thing I did at my new job was to install Linux.  During the installation process I notice that I do not have any Internet access. "No problem", I thought, "it's only the proxy..." (this being semi-corporate, after-all).  So after the install I booted Windows, went to Internet Explorer to get the proxy details.  This is how it looks:



That was painless! So I booted back into Linux and set my global proxy... and then... nothing.  I then continued to spend a full two and a half days trying to get my Linux box to log onto the Windows domain (my assumption being that that is a requirement to gain Internet access around here).  I first tried Samba... but had no joy. I suspected it had to be a setting somewhere that a missed.  I then tried 'likewise' (https://help.ubuntu.com/8.04/serverguide/likewise-open.html) after being recommended by a couple of websites.  No luck... I ran diagnostics tools, pulled and sacrificed some hair to the Windows gods, fiddled some more with Samba, cried a little, searched the Interwebs, cried a lot, sacrificed some more hair, no luck...

Finally I mailed the one other person that I knew ran Linux.  "You only need the proxy... no need to log onto the domain." was his reply.  I started to get that metal-taste in my mouth and felt the hair in my neck rise... back into Windows... and yip, to my horror, the textfield was too small to accommodate the IP.  Logged back into Linux, set the correct IP and viola! Internet is working...


I hate Windows... (and the number 9 a little bit too)

Wednesday, September 5, 2012

Gamification of day to day development work

I've been thinking a lot today about how to help developers write solid code, and write good tests, and perhaps even have some fun in the process. So what about achievements for your day to day work?

I know Visual Studio has done something like this in the past, but it's been for C++ (or C#), and as far as I understand, it's been developer centric, not project centric.

The problem with the developer centric approach is that it inherently leaves out people, and in my opinion doesn't promote the health of the project as a whole. I think a combination of the two seems to be in order, so there should be developer-specific achievements, but also project-specific achievements.

I'm really keen to develop something like this. I was recently bitten by the fact that I stopped writing unit tests (Because of time pressure, the usual BS excuse) and now of course my mind has been racing on how to stop this trend. Now I'm not that naive that I'll think something like this will solve all the worlds shitty software problems, but, if it helps even a little bit and people have fun in the process then why not?

Here's some achievements I've come up with, please feel free to add some or comment on them:

Project Achievements

  • Have to start somewhere: 5% test coverage
  • It’s improving: 10% test coverage
  • Getting better still: 30% test coverage
  • Works on my machine: 50% test coverage
  • Getting pretty solid: 75% test coverage
  • Should work: 80% test coverage
  • Will work: 90% test coverage
  • Specs may be wrong, but it works: 95% test coverage
  • The first cut is (not always) the deepest: First unit test gets added.
  • Single celled organism: Project has been active for one month
  • Fetus: Project has been active for two months
  • Baby: Project has been active for three months
  • Toddler: Project has been active for 5 months
  • Child: Project has been active for 1 year
  • Teenager: Project has been active for 1.5 years
  • Adult: Project has been active for 2 years
  • Middle Aged: Project has been active for 3 years
  • Twilight Years: Project has been active for 4 years
  • It’s not all about the code: Project gets its first “resource” file (ie. src/main/resources for Java)
  • Blink and you’ll miss it: Project takes less than 5 seconds to build
  • Watching paint dry: project takes longer than 1 minute to build
  • Don’t tell Han Solo: More than 1 developer has worked on the project
  • Three’s a crowd: More than 3 developers has worked on the project
  • It’s more fun this way: More than 5 developers has worked on the project
  • Polygamy : More than 10 developers has worked on the project
  • This is serious: Project gets its first tag
  • We can like to be doing configuration management: More than 5 tags
  • We know what’s out there: More than 20 tags
  • We’ve got a release coming up: More than 5 tags in a week
  • We’re trying to fix it (or it’s growing quickly) [this could depend on age of project]: More than 10 commits in a single day
  • It’s not working: Same as above, but for 5 commits

Developer Achievements

  • Adding functionality by removing code: LOC get less, but test coverage increases.
  • Removing a dependency: A dependency is removed from the project.
  • Getting it Back on track: Added a first test after a long period of no tests being added
  • Likes it pretty: Made a commit with only whitespace changes
  • Leave is in the bank: 15 continous days of commits
  • The Creator: First commit for a new project
  • Dabbling: Developer worked on more than 1 project
  • Branching out: Developer worked on more than 3 projects
  • Part of the furniture: Developer worked on more than 10 projects
  • Letting go of the shackles: Developer worked in more than 1 language
  • Versatile: Developer worked in more than 3 languages
  • It’s all the same: Developer worked in more than 5 languages
  • Burning the midnight oil: Developer pushed a commit between 12:00 and 05:00 am
  • Who needs to eat? Developer pushed a commit between 12:00 am and 13:00 pm
  • No Youtube: Developer pushed more than 10 commits in one day
  • Hard worker: Developer pushed more than 20 commits in one day
  • Must be a bot! Developer pushed more than 50 commits in one day
  • Diligent: 2 commits in a row where a test is added
  • Knows what he’s doing: 3 commits in a row where a test is added
  • Consistent: 2 commits in a row with a good commit message
  • Taking it to heart: 5 commits in a row with a good commit message
  • Shakespeare: 10 commits in a row with a good commit message
  • Give him the documentation work! 20 commits in a row with a good commit message
  • I’m focused: More than one month commiting to the same project