Parallel / distributed password cracking with John the Ripper and MPI

This article has been updated to reflect the changes for John version 1.7.8 as released in june 2011. The most important change is the fact that MPI support is now integrated in the jumbo patch.

The original John the Ripper off-line password cracker only uses a single processor (core) when performing brute-force or dictionary attacks.

JtR does not use multiple cores (or machines). However, there is a patch available that enables support of MPI. MPI allows you to distribute the workload of a program across multiple instances, thus cores or even machines, but your application must support it.

The fun thing with MPI is that it is very easy to create a password cracking cluster. But for now let's just focus on using all these unused CPU cores to help us with cracking passwords.

I am using Ubuntu and Debian Linux as my platform but Mac OS X works also perfectly.

install MPI support

Note: Mac users have mpi support installed by default and don't need to install this.

apt-get install libopenmpi-dev openmpi-bin

download John the Ripper with extra patches

Get the john-1.7.8-jumbo-2.tar.gz file.

extract John & edit the Make file

tar xzf john-1.7.8-jumbo-2.tar.gz
cd john-1.7.8-jumbo-2/src

uncomment the following lines in the Makefile:

CC = mpicc -DHAVE_MPI -DJOHN_MPI_BARRIER -DJOHN_MPI_ABORT`
MPIOBJ = john-mpi.o`

Compile John the Ripper with MPI support

Run make and choose the most appropriate processor architecture. Example:

make linux-x86-64 (for 64-bit i386)
make linux-x86-sse2 (for 32-bit i386)
make macosx-x86-64 (for 64 bit Mac OS X)

Test john the Ripper

cd ../run
./john --test

Look at the benchmark values of the first test and remember them. Now let's see if MPI does any better:

mpirun -np [number of processor (virtual) cores] ./john --test

Let's asume that you have an iMac 27" with a Core i7 with 4 real cores and hyper threading enabled. This will provide a total of 8 virtual cores.

mpirun -np 8 ./john --test

If you notice a significant increase in performance, you know that MPI is working properly.

Some benchmarks without and with MPI support (Traditional DES)

These are the benchmark test results when using a single core on an old Nehalem Core i7 920:

Many salts: 2579K c/s real, 2579K c/s virtual
Only one salt:  2266K c/s real, 2266K c/s virtual

These are the benchmark test results when using MPI and thus all 8 cores:

Many salts: 11015K c/s real, 11015K c/s virtual
Only one salt:  9834K c/s real, 9834K c/s virtual

And just look at the performance improvement when we overclock from 2,66 to 3,6 Ghz:

Many salts: 15004K c/s real, 15004K c/s virtual
Only one salt:  13232K c/s real, 13232K c/s virtual

That is very significant. Now admire how the Core i7 920 @ 3.6 Ghz is blown away by the Sandy bridge based Core i7-2600 @ 3.4 Ghz:

Many salts: 20007K c/s real, 20209K c/s virtual
Only one salt:  16881K c/s real, 16881K c/s virtual

Setting up an MPI cluster

MPI clustering is based on using SSH keys. There is a single master that uses all nodes to perform the computation. The nodes are put into a text file nodes.txt like this:

node01  slots=2
node02  slots=2
node03  slots=4 
node04  slots=4

In this example, node 2 and 3 are dual-core systems, while node 3 and 4 are installed with quad-core processors. You must create an account on all your nodes with the same name that is used on the master, when running the master process. You also must generate a private SSH key and distribute the public part as the authorized_keys file to all nodes. This is outside the scope of this post. Please note that the SSH private key should be loaded with ssh-agent if used with a passphrase, or do not configure a passphrase on the key. If you do not use a pass phrase, understand that anyone with access to the key can access all nodes.

You may also have to put the nodexx entries in your /etc/hosts file if the names cannot be resolved by DNS.

Now I'm assuming that you are able to ssh into all nodes without requireing a password, thus ssh is properly setup.

* mpirun -np 12 -hostfile nodes.txt ./john --test

Now you should see increased performance, beyond the limit of a single host.

Some benchmarks

I ran a password cracking test on some data using a large dictionary. These are the performance differences when using all 8 cores of my Core i7 920 instead of just one:

single: 0:00:04:48      c/s: 11192K
mpi:    0:00:01:26      c/s: 46568K

The performance increase is significant.

Louwrentius