Genetic Boggle

top What is Genetic Boggle?

Genetic Boggle is a program that uses a genetic algorithm to encode as many strings as possible from a given list into a grid of letters using Boggle encoding. It starts with a set (called a generation) of randomly generated grids and applies fitness rules to them. The fittest grids mate and spawn a new generation of grids. This simple procedure is repeated over and over. As if by magic, a result is obtained that would be very difficult to program using a more conventional analytical approach.

top How do I get the program?

If you want to run the program, download the executable file (Windows only).
Try it out on any ASCII list of words, or just save my list.
If you want to see the C++ code, get this all-inclusive zip file.

top How do I use the program?

  1. Press the LOAD button to read in a lexicon.
  2. Press the GO button to start the process. When started
  3. Press the STOP button to pause the process. When paused
  4. Adjust the mutation rate at any time.

top What is Boggle encoding?

Boggle is a game in which letters are randomly placed into a grid and players have to find the words that are present in that grid. Since it is unlikely that random letter tiles (the game actually uses cubes) will show a word spelled from beginning to end in left-to-right order, these generalized rules are available for finding words: This means the grid
F E
T Z
encodes the word fete but not the word feet. The word fete is said to be boggle-encoded in this grid.

top How does the genetic algorithm work?

Terms:
grid is an 8x8 grid of letters (a member of generation).
gridx,y is the letter at position (x,y) in grid grid.
f(grid) is the fitness of a grid. It equals the number of lexicon strings currently encoded in the grid.
f(gridx,y) is the fitness of the letter at position (x,y). It equals the total number of times that letter is used to encode lexicon strings.
  1. A set of strings is read from a file, one per line. This is the lexicon. Only words from the lexicon will be counted in determining fitness.
  2. A generation of 105 grids of 64 letters (8x8) each is created and filled with random characters.
  3. A fitness test is performed on all members of the generation to calculate f(grid).
  4. The 15 most fit grids are chosen. If f(grid) > f(bestgrid) then
    1. bestgrid = grid
    2. Display grid.
  5. The fitness of every character in each of the 15 fittest grids (called parents) is determined.
  6. The parents are paired in every possible combination to produce 105 children (15 choose 2 = 105). The childx,y = f(momx,y) > f(dadx,y) ? momx,y : dadx,y. That is, take either the letter from one parent or the other based on which letter is used more.
    1. When f(momx,y) = f(dadx,y) AND f(momx,y) > 0, randomly choose either momx,y or dadx,y. This will achieve crossover.
    2. When f(momx,y) = f(dadx,y) AND f(momx,y) = 0, choose a random letter. This will serve as a mutator.
  7. Apply an additional mutator to drive the grid away from local minima:
    Loop 0 to mutation setting
    { randomly choose a grid
      randomly choose a position (x,y)
      set gridx,y to a random letter. }
    Mutation setting can be adjusted from 0 (no mutation except when the letter is not used--see 6b) to 5000. The highest setting was determined empirically to be a deadly dose--f(grid) quickly drops to near 0.
  8. Repeat steps 3 - 7 until the user hits the STOP button or all strings are encoded.

top Does the program do anything clever?

Once you read and understand the algorithm above it is easy to believe the program must have been written by chimps. I prefer to think of its simplicity as "elegance" rather than "idiocy." The proof is, of course, that this scheme does (miraculously) work.
There are a few clever things done in the design of the program to make each generation proceed more quickly. The reason is simple. This program is generally massively compute bound. It runs for days or weeks. Any savings in time is multiplied by millions of iterations. There are also some clever things done to make the code easier to use:

top Who came up with this idea?

I wrote the original program to produce a solution to a puzzle contest in the December 1996 Games Magazine. The contest (called Gridlock) was to encode as many game titles as possible into an 8 by 8 grid of letters. My best solution using this algorithm was 32 titles (achieved after a 20 hour run). This was an impressive 84% of optimal: the human winner (not me) was able to encode 38 titles.
The genetic algorithm used here is my own design. Given the sheer simple-mindedness of the algorithm, it is amazing it works at all. But what do you know? It works well. It clearly illustrates the power of genetic programming. Instead of programming for weeks and running the program for seconds and getting a good answer, I programmed for days and ran the program for days to get a good answer. In many cases this is an excellent tradeoff.
The simpler brute-force method--try every possible combination--is not possible here: there are ~1075 give or take a few billion billion billion trillion (rotations and reflections).

top Isn't Boggle® a registered trademark?

Yes. Boggle® is a registered trademark of Parker Brothers, a division of Hasbro, Inc. They "provide children and families with the highest quality and most recognizable toys and games in the world." In fact, so recognizable, the easiest way to explain the kind of encoding used here is to refer to this popular and ubiquitous game.
While I believe Boggle® is a high quality and recognizable game, I have no connection with Hasbro, Inc. and am using this trademark without permission. Shame on me.

top
Send email to guy@professorguy.com

© Copyright 2004 Guy T. Schafer
Last updated August, 2011