This article has been updated to reflect the changes for John version 1.7.8 as released in june 2011. The most important change is the fact that MPI support is now integrated in the jumbo patch.
The original John the Ripper off-line password cracker only uses a single processor (core) when performing brute-force or dictionary attacks.
JtR does not use multiple cores (or machines). However, there is a patch available that enables support of MPI. MPI allows you to distribute the workload of a program across multiple instances, thus cores or even machines, but your application must support it.
The fun thing with MPI is that it is very easy to create a password cracking cluster. But for now let's just focus on using all these unused CPU cores to help us with cracking passwords.
I am using Ubuntu and Debian Linux as my platform but Mac OS X works also perfectly.
install MPI support
Note: Mac users have mpi support installed by default and don't need to install this.
- apt-get install libopenmpi-dev openmpi-bin
download John the Ripper with extra patches
- Get the john-1.7.8-jumbo-2.tar.gz file.
extract John & edit the Make file
- tar xzf john-1.7.8-jumbo-2.tar.gz
- cd john-1.7.8-jumbo-2/src
uncomment the following lines in the Makefile:
CC = mpicc -DHAVE_MPI -DJOHN_MPI_BARRIER -DJOHN_MPI_ABORT` MPIOBJ = john-mpi.o`
Compile John the Ripper with MPI support
Run make and choose the most appropriate processor architecture. Example:
make linux-x86-64 (for 64-bit i386) make linux-x86-sse2 (for 32-bit i386) make macosx-x86-64 (for 64 bit Mac OS X)
Test john the Ripper
- cd ../run
- ./john --test
Look at the benchmark values of the first test and remember them. Now let's see if MPI does any better:
- mpirun -np [number of processor (virtual) cores] ./john --test
Let's asume that you have an iMac 27" with a Core i7 with 4 real cores and hyper threading enabled. This will provide a total of 8 virtual cores.
- mpirun -np 8 ./john --test
If you notice a significant increase in performance, you know that MPI is working properly.
Some benchmarks without and with MPI support (Traditional DES)
These are the benchmark test results when using a single core on an old Nehalem Core i7 920:
Many salts: 2579K c/s real, 2579K c/s virtual Only one salt: 2266K c/s real, 2266K c/s virtual
These are the benchmark test results when using MPI and thus all 8 cores:
Many salts: 11015K c/s real, 11015K c/s virtual Only one salt: 9834K c/s real, 9834K c/s virtual
And just look at the performance improvement when we overclock from 2,66 to 3,6 Ghz:
Many salts: 15004K c/s real, 15004K c/s virtual Only one salt: 13232K c/s real, 13232K c/s virtual
That is very significant. Now admire how the Core i7 920 @ 3.6 Ghz is blown away by the Sandy bridge based Core i7-2600 @ 3.4 Ghz:
Many salts: 20007K c/s real, 20209K c/s virtual Only one salt: 16881K c/s real, 16881K c/s virtual
Setting up an MPI cluster
MPI clustering is based on using SSH keys. There is a single master that uses all nodes to perform the computation. The nodes are put into a text file nodes.txt like this:
node01 slots=2 node02 slots=2 node03 slots=4 node04 slots=4
In this example, node 2 and 3 are dual-core systems, while node 3 and 4 are installed with quad-core processors. You must create an account on all your nodes with the same name that is used on the master, when running the master process. You also must generate a private SSH key and distribute the public part as the authorized_keys file to all nodes. This is outside the scope of this post. Please note that the SSH private key should be loaded with ssh-agent if used with a passphrase, or do not configure a passphrase on the key. If you do not use a pass phrase, understand that anyone with access to the key can access all nodes.
You may also have to put the nodexx entries in your /etc/hosts file if the names cannot be resolved by DNS.
Now I'm assuming that you are able to ssh into all nodes without requireing a password, thus ssh is properly setup.
* mpirun -np 12 -hostfile nodes.txt ./john --test
Now you should see increased performance, beyond the limit of a single host.
I ran a password cracking test on some data using a large dictionary. These are the performance differences when using all 8 cores of my Core i7 920 instead of just one:
single: 0:00:04:48 c/s: 11192K mpi: 0:00:01:26 c/s: 46568K
The performance increase is significant.