Genetic Boggle

top What is Genetic Boggle?

Genetic Boggle is a program that uses a genetic algorithm to encode as many strings as possible from a given list into a grid of letters using Boggle encoding. It starts with a set (called a generation) of randomly generated grids and applies fitness rules to them. The fittest grids mate and spawn a new generation of grids. This simple procedure is repeated over and over. As if by magic, a result is obtained that would be very difficult to program using a more conventional analytical approach.

top How do I use the program?

  1. Press the LOAD button to read in a lexicon.
  2. Press the GO button to start the process. When started
    • The generation number will increase to show progress.
    • A graph will display showing the fitness of the best grid from every generation.
    • The best grid that has ever been encountered will display in the "Best of Breed" display. This will change whenever the grid displayed is bested.
    • The GO button becomes a START AGAIN button, but is not available.
  3. Press the STOP button to pause the process. When paused
    • The RESUME button becomes available. Press it to resume operation with the current generation.
    • The START AGAIN button becomes available. Press it to restart from generation 0 with new random grids.
    • Press the SAVE button to save the best grid to a text file.
  4. Adjust the mutation rate at any time.

top What is Boggle encoding?

Boggle is a game in which letters are randomly placed into a grid and players have to find the words that are present in that grid. Since it is unlikely that random letter tiles (the game actually uses cubes) will show a word spelled from beginning to end in left-to-right order, these generalized rules are available for finding words:
  • Letters which are adjacent in the word must be adjacent in the grid.
  • A letter L1 is said to be adjacent in the grid to a letter L2 if and only if L2 is in one of the 8 grid locations which immediately surround L1.
  • Letters in the grid may be used more than once, but not consecutively.
This means the grid
F E
T Z
encodes the word fete but not the word feet. The word fete is said to be boggle-encoded in this grid.

top How does the genetic algorithm work?

Terms:
grid is an entire grid (a member of generation).
gridx,y is the character at position (x,y) in grid grid.
  1. A set of strings is read from a file, one per line. This is the lexicon. Only words from the lexicon will be counted in determining fitness.
  2. A generation of 105 grids is created and filled with random characters.
  3. A fitness test is performed on all members of the generation. The fitness f(grid) equals the number of lexicon strings currently encoded in the grid.
  4. The 15 most fit grids are chosen. If f(grid) > f(bestgrid) then
    1. bestgrid = grid
    2. Display grid.
  5. The fitness of every character in each of the 15 fittest grids (parents) is determined. The fitness of a character f(gridx,y) equals the total number of times that character is used for encoding.
  6. The parents are paired in every possible combination to produce 105 children (15 choose 2 = 105). The childx,y = f(momx,y) > f(dadx,y) ? momx,y : dadx,y. That is, take either the letter from one parent or the other based on which letter is used more.
    1. When f(momx,y) = f(dadx,y) AND f(momx,y) > 0, randomly choose either momx,y or dadx,y. This will achieve crossover.
    2. When f(momx,y) = f(dadx,y) AND f(momx,y) = 0, choose a random letter. This will serve as a mutator.
  7. Apply an additional mutator to drive the grid away from local minima:
    Loop 0 to mutation setting
    { randomly choose a grid
      randomly choose a position (x,y)
      set gridx,y to a random letter. }
    Mutation setting can be adjusted from 0 (no mutation except when the letter is not used--see 6b) to 100. The highest setting was determined empirically to be a deadly dose--f(grid) quickly drops to near 0.
  8. Repeat steps 3 - 7 until the user hits the STOP button or all strings are encoded.

top Does the program do anything clever?

Once you read and understand the algorithm above it is easy to believe the program must have been written by chimps. I prefer to think of its simplicity as "elegance" rather than "idiocy." The proof is, of course, that this scheme does (miraculously) work.
There are a few clever things done in the design of the program to make each generation proceed more quickly. The reason is simple. This program is generally massively compute bound. It runs for days or weeks. Any savings in time is multiplied by millions of iterations.
  • There are two arrays that each can hold an entire generation of grids. The odd numbered generations are kept in one, the evens in the other. This prevents the parents from having to "get out of the way" of the children which will inhabit the next generation. It saves the program from having to copy the fairly complicated objects to some holding location.
  • Tests for odd or even generations are done outside of the processing loops even though it makes the code look nasty. Instead of the tidy
    	for (x...)
    	    for (y...)
    		if (even)
    		    do even stuff
    		else
    		    do odd stuff
    the program is written
    	if (even)
    	{
    	    for (x...)
    		for (y...)
    		    do even stuff
    	} else {
    	    for (x...)
    		for (y...)
    		    do odd stuff
    	}
    This cuts down the total number of operations. Good compilers are supposed to do this optimization automatically. I don't leave it to chance.
There are also some clever things done to make the code easier to use:
  • A template is used for members of a generation. The grid class is given to this template. This code could be quickly modified to use different objects as members of the generations.
  • The Generation object has an overloaded ++ operator. This has the obvious effect of iterating to (and thus creating) the next generation.
  • The Grid object has an overloaded * operator. This is used to mate two grids to form a child. Thus, child = dad * mom. Look, they're multiplying! Get it?

top Who came up with this idea?

I wrote the original program to produce a solution to a puzzle contest in the December 1996 Games Magazine. The contest (called Gridlock) was to encode as many game titles as possible into an 8 by 8 grid of letters. My best solution using this algorithm was 32 titles (achieved after a 20 hour run). This was an impressive 84% of optimal: the human winner (not me) was able to encode 38 titles.
The genetic algorithm used here is my own design. Given the sheer simple-mindedness of the algorithm, it is amazing it works at all. But what do you know? It works well. It clearly illustrates the power of genetic programming. Instead of programming for weeks and running the program for seconds and getting a good answer, I programmed for days and ran the program for days to get a good answer. In many cases this is an excellent tradeoff.
The simpler brute-force method--try every possible combination--is not possible here: there are ~1075 give or take a few billion billion billion trillion (rotations and reflections).

top Isn't Boggle® a registered trademark?

Yes. Boggle® is a registered trademark of Parker Brothers, a division of Hasbro, Inc. They "provide children and families with the highest quality and most recognizable toys and games in the world." In fact, so recognizable, the easiest way to explain the kind of encoding used here is to refer to this popular and ubiquitous game.
While I believe Boggle® is a high quality and recognizable game, I have no connection with Hasbro, Inc. and am using this trademark without permission. Shame on me.

top
Send email to Guy@ProfessorGuy.org

© Copyright 1999 Guy T. Schafer
Last updated October, 1999