July 1, 2013

Scrambling Letters


Ever since I saw my first Jumble word puzzle in the newspaper as a child, I’ve always wanted to write an algorithm to solve it. This inkling was lately renewed when I came across the 4 Pics 1 Word puzzle game. I had an idea of the simple brute-force algorithm that could solve the problem, so I went about implementing it in code. You can play with the final version here. I’ve limited it to only search for words that are found in 4 Pics 1 Word to limit the number of possible answers it finds.

The algorithm is easiest to implement recursively. If you know the number of letters in the answer (r), and you have a list of available letters (n), then you simply place one letter at a time in each available spot and exhaust all possible permutations. There are going to be a total of nPr possible permutations. So it’s best to limit it to smaller values of n and r. Once you have all the possible combinations of ‘words’, you can use a dictionary to check which ones of your ‘words’ are actual English words. The dictionary can just be a hash table and you can check if your ‘word’ exists in the table.

The Algorithm

Although the brute-force algorithm is trivial to describe, implementing it properly is another story. I ended up with the following function. The base case occurs when there is only spot remaining in the answer. When this is the case, the function returns a list of all the unused letters that can be placed in that spot. When not in the base case, the function will fill the first available spot with a letter and then recursively find all possible combinations for the remaining spots using the remaining letters. It does this for all possible letters that could take up the first available spot. Here’s the python code for it:

def getWordsRecursive(spot, remainingLetters, numSpots):
    if spot == (numSpots - 1):
        words = []
        for letter in remainingLetters:
            words += [letter]
        return words
    words = []
    for i in range(len(remainingLetters)):
        letter = remainingLetters[i]
        suffixes = getWordsRecursive(spot+1, remainingLetters[:i] + remainingLetters[(i+1):], numSpots)
        for suffix in suffixes:
            word = letter + suffix
            words += [word]
    return words

Calling getWordsRecursive(0, 'ABCDEFGHIJKL', 5) will give you all the possible 5 letter combinations that can be created from the letters A-L.

Making it faster

Although this works, it is horribly slow. I didn’t mind it too much at first since I was able to get most answers on my local machine within 0-30 seconds. But when I uploaded this implementation to my shared host server, it was taking over 5 minutes for some queries and my server would automatically kill the process when it took that long. So I set about finding ways of making this faster. One of my first thoughts was implementing the algorithm iteratively since I thought that the deep use of the stack used by the recursive algorithm might be the reason for the server killing the process. But after a little more thought, I realized that I could use the extra knowledge I had about this particular problem domain to make things more efficient while keeping the simple recursive function.

Since I am only interested in finding valid English words, I can use this information to cut the search space drastically. Let’s say that once I have determined all possible combinations for the first 3 letters in the answer, I can check with my dictionary if there are any English words that start with those 3 letters. If there are none, then I do not need to find the remaining possible combinations for those 3 letter combinations. Here’s a python function putting that information to use:

def getWordsFaster(pSpot, pLetters, pNumSpots, substringStart=0):
    if pNumSpots <= 3:
        return getWordsRecursive(0, pLetters, pNumSpots)
    firstHalfLen = len(pLetters[:pNumSpots/2])
    prefixes = getWordsRecursive(0, pLetters, firstHalfLen)
    prefixes = getDictWords(prefixes, substringStart=substringStart, substringLen=firstHalfLen)
    words = []
    for prefix in prefixes:
        remainingLetters = pLetters
        for letter in prefix:
            i = remainingLetters.find(letter)
            remainingLetters = remainingLetters[:i] + remainingLetters[i+1:]
        suffixes = getWordsFaster(0, remainingLetters, pNumSpots - firstHalfLen, firstHalfLen)
        for suffix in suffixes:
            words.append(prefix + suffix) 
    return words

If the number of spots in the answer is 3 or fewer, then I simply resort to the first function since it is fast enough. For more spots, I first get all the possible combinations that could fit in the first half part of the answer using the first function. (The splitting could be made more granular based on how big of an output you are supporting. I have restricted my output to a maximum of 8 letters (not shown here), so just cutting in half was good enough). Then I use the getDictWords() function (also not shown here) to filter out all those combinations that can’t possibly be English words. Now for all the remaining valid combinations, I call this function recursively to return all combinations that start with those valid prefixes. Once I have the final list, I run it again through getDictWords() to get my final filtered list of English words.


This was a fun little problem to solve that looked deceptively simple, yet wasn’t as trivial to implement as I initially thought it would be. Finding a way to cut down the search space made a drastic improvement in the algorithm speed. Most queries I have tried finish up in under 1 minute on the server now. Try it out for yourself. I also released an Android app that uses this algorithm to help you find answers to the ‘4 Pics 1 Word’ word puzzle game. The app additionally uses OCR to detect the available letters and the number of spots in the answer so the user doesn’t have to input any data at all, but simply provide a screenshot of the puzzle.

Leave a Reply