Genetic Boggle
What is Genetic Boggle?
Genetic Boggle is a program that uses a
genetic algorithm to encode
as many strings as possible from a given list into
a grid of letters using
Boggle encoding.
It starts with a set (called a generation) of randomly
generated grids and applies fitness
rules to them. The fittest grids mate and spawn a
new generation of grids. This simple procedure is
repeated over and over. As if by magic, a result is
obtained that would be very difficult to program
using a more conventional analytical approach.
How do I get the program?
If you want to run the program,
download the executable file
(Windows only).
Try it out on any ASCII list of words, or just save
my list.
If you want to see the C++ code, get this all-inclusive
zip file.
How do I use the program?
- Press the LOAD button to read in a lexicon.
- Press the GO button to start the process.
When started
- The generation number will increase to show progress.
- A graph will display showing the fitness of
the best grid from every generation.
- The best grid that has ever been encountered will
display in the "Best of Breed" display. This will change
whenever the grid displayed is bested.
- The GO button becomes a
START AGAIN button, but is not available.
- Press the STOP button to pause the process.
When paused
- The RESUME button becomes available.
Press it to resume operation with the
current generation.
- The START AGAIN button becomes available. Press it
to restart from generation 0 with new random grids.
- Press the SAVE button to save the best grid
to a text file.
- Adjust the mutation rate at any time.
What is Boggle encoding?
Boggle is a game in which letters are randomly
placed into a grid and players have to find the
words that are present in that grid. Since it
is unlikely that random letter tiles (the game
actually uses cubes) will show a word spelled
from beginning to end in left-to-right order,
these generalized rules are available for finding words:
- Letters which are adjacent in the word must
be adjacent in the grid.
- A letter L1 is said to be adjacent in the grid
to a letter L2 if and only if L2 is in
one of the 8 grid locations which immediately surround
L1.
- Letters in the grid may be used more than once,
but not consecutively.
This means the grid
encodes the word
fete but not the word
feet.
The word
fete is said to be boggle-encoded in this grid.
How does the genetic algorithm work?
Terms:
grid is an 8x8 grid of letters (a member of generation).
gridx,y is the letter at position (x,y)
in grid
grid.
f(grid) is the fitness of a grid. It equals the number of
lexicon strings currently encoded in the grid.
f(gridx,y) is the fitness of the letter at
position (x,y). It equals the total number of times
that letter is used to encode lexicon strings.
- A set of strings is read from a file, one per line. This
is the lexicon. Only words from the lexicon will be counted
in determining fitness.
- A generation of 105 grids of 64 letters (8x8) each
is created and filled with random characters.
- A fitness test is performed on all
members of the generation to calculate f(grid).
- The 15 most fit grids are chosen. If f(grid) >
f(bestgrid) then
- bestgrid = grid
- Display grid.
- The fitness of every character in each of the 15 fittest
grids (called parents) is determined.
- The parents are paired in every possible combination to
produce 105 children (15 choose 2 = 105).
The childx,y = f(momx,y)
> f(dadx,y) ? momx,y :
dadx,y. That is, take either the letter from
one parent or the other based on which letter is used more.
- When f(momx,y) = f(dadx,y)
AND f(momx,y) > 0, randomly choose either
momx,y or dadx,y.
This will achieve crossover.
- When f(momx,y) = f(dadx,y)
AND f(momx,y) = 0, choose a random letter.
This will serve as a mutator.
- Apply an additional mutator to drive the grid away from
local minima:
Loop 0 to mutation setting
{ randomly choose a grid
randomly choose a position (x,y)
set gridx,y to a random letter. }
Mutation setting can be adjusted from 0
(no mutation except when the letter is not used--see 6b)
to 5000. The highest setting was determined empirically to be a
deadly dose--f(grid) quickly drops to near 0.
- Repeat steps 3 - 7 until the user hits the STOP button
or all strings are encoded.
Does the program do anything clever?
Once you read and understand the algorithm above it is easy
to believe the program must have been written by chimps.
I prefer to think of its simplicity as "elegance" rather
than "idiocy." The proof is, of course, that this scheme
does (miraculously) work.
There are a few clever things done in the design of the
program to make each generation proceed more quickly.
The reason is simple. This program is generally massively
compute bound. It runs for days or weeks. Any savings in
time is multiplied by millions of iterations.
There are also some clever things done to make the code
easier to use:
- A template is used for members of a generation. The
grid class is given to this template. This code could
be quickly modified to use different objects as members of
the generations.
- The Generation object has an overloaded ++ operator.
This has the obvious effect of iterating to (and thus
creating) the next generation.
- The Grid object has an overloaded * operator.
This is used to mate two grids to form a child.
Thus, child = dad * mom. Look, they're multiplying! Get it?
Who came up with this idea?
I wrote the original program to produce a solution to a
puzzle contest in the December 1996
Games Magazine.
The contest (called Gridlock) was to encode as many
game titles as possible
into an 8 by 8 grid of letters. My best solution using this
algorithm was 32 titles (achieved after a 20 hour run).
This was an impressive 84% of optimal:
the human winner (not me) was able to encode 38 titles.
The genetic algorithm used here is my own design. Given the sheer
simple-mindedness of the algorithm, it is amazing it works
at all. But what do you know? It works well.
It clearly illustrates the
power of genetic programming. Instead of programming
for weeks and running the program for seconds and getting
a good answer, I programmed for days and ran the program for
days to get a good answer. In many cases this is an
excellent tradeoff.
The simpler brute-force method--try every possible
combination--is not possible here:
there are ~10
75 give or take a few billion
billion billion trillion (rotations and reflections).
Isn't Boggle®
a registered trademark?
Yes. Boggle
® is a registered trademark of
Parker Brothers, a division of
Hasbro, Inc. They "provide children and families with
the highest quality and most recognizable toys
and games in the world." In fact, so recognizable, the
easiest way to explain the kind of encoding used here is to
refer to this popular and ubiquitous game.
While I believe Boggle
® is a high quality and
recognizable game, I have no connection with Hasbro, Inc.
and am using this trademark without permission.
Shame on me.
Send email to
guy@professorguy.com
© Copyright 2004
Guy T. Schafer
Last updated August, 2011