A Genetic Algorithm for Cartographic
Label Placement by Guy T. Schafer
What is Carto?
Carto is a Windows program that uses a
genetic algorithm to place labels near point
features of a map. The labels have a uniform size (60 x 12 points)
and can be placed in only one of 4 positions like this:
The optimal labeling has the fewest number of labels that overlap.
Finding the optimal labeling is an NP-hard problem so there is likely
no analytical solution which is efficient. Carto uses the power
of the genetic algorithm to more quickly converge to a good solution
without having to consider all 4n possible labelings--an
impossible task.
Carto starts with a set (called a generation) of randomly
generated labelings and applies fitness rules to them.
The fittest labelings mate and spawn a
new generation of labelings. This simple procedure is
repeated over and over. As if by magic, a result is
obtained that would be very difficult to program
using a more conventional analytical approach.
Where do I get the program?
You can get executable (Windows) and a
sample map file.
Or get the C++ source code, and test map in
a zip file. Please feel free
to use any or all of this code for your own projects.
How do I use the program?
- Press the Load Map Coordinates button to read
in an array of point features.
- Press the Go button to start the process.
When started
- The generation number will increase to show progress.
- A graph will display showing the fitness of
the best labeling from every generation.
- The best labeling that has ever been encountered will
display in the right side display. This will change
whenever the labeling displayed is bested.
- The Go button becomes a
Start Again button, but is not available.
- Press the Stop button to pause the process.
When paused
- The Resume button becomes available.
Press it to resume operation with the
current generation.
- The Start Again button becomes available. Press it
to restart from generation 0 with new random labelings.
- Press the Save Postscript Map button to save
the best labeling (shown at the right) to a postscript file.
- Adjust the mutation rate at any time.
How does the genetic algorithm work?
Terms:
map is an entire labeling (a member of generation).
mapi is the choice of position (UL, UR, LL, or LR)
for the i
th point of the
map.
- A set of points is read from a file,
an x value and a y value on each line. This is stored in an
array of points.
- A generation of 105 maps is created and filled with
the points from the array along with random labeling positions
(these positions are the "genes").
- A fitness test is performed on all members
of the generation. The fitness of a position
f(mapi) is 1 if it is NOT overlapped
by any other label, otherwise it is 0.
The fitness of a map is equal to the sum of the fitnesses of all
the points.
f(map) = Sumi f(mapi).
That is, the fitness of a map is the number of labels which are
NOT overlapped.
- The 15 most fit grids are chosen. If f(map) >
f(bestmap) then
- bestmap = map
- Display map.
- The parents are paired in every possible combination to
produce 105 children (15 choose 2 = 105).
The child is given a position for mapi by one of
the following methods (depending on an #ifdef in the code):
Directed evolution
- If f(momi) = 1 and
f(dadi) = 0
then
childi = momi.
- If f(momi) = 0 and
f(dadi) = 1
then
childi = dadi.
- If f(momi) = 1 and
f(dadi) = 1
then childi
is set randomly to either
momi or dadi
(Crossover)
- If f(momi) = 0 and
f(dadi) = 0
then childi
is set randomly to any one of the 4 possible positions
(Mutator)
|
Natural evolution
childi is set randomly to either
momi or dadi
|
- Apply an additional mutator to drive the
map away from local maxima:
Loop 0 to mutation setting
{
randomly choose a map
randomly choose a position mapi
set mapi to a random position.
}
Mutation setting can be adjusted from 0
to 20n where n = the number of points.
The highest setting was determined empirically to be
deadly--f(map) quickly plummets and no
improvement occurs.
- Repeat steps 3 - 7 until the user hits the Stop button
or all labels are not overlapped.
Does the program do anything clever?
Once you read and understand the algorithm above it is easy
to believe the program must have been written by chimps.
I prefer to think of its simplicity as "elegance" rather
than "idiocy." The proof is, of course, that this scheme
does (miraculously) work.
There are a few clever things done in the design of the
program to make each generation proceed more quickly.
The reason is simple. This program is generally massively
compute bound. It runs for days or weeks. Any savings in
time is multiplied by millions of iterations.
- There are two arrays that each can hold an entire
generation of grids. The odd numbered generations are kept
in one, the evens in the other. This prevents the
parents from having to "get out of the way" of the
children which will inhabit the next generation. It
saves the program from having to copy the fairly
complicated objects to some holding location.
- Tests for odd or even generations are done
outside of the
processing loops even though it makes the code look
nasty. Instead of the tidy
for (x...)
for (y...)
if (even)
do even stuff
else
do odd stuff
the program is written
if (even)
{
for (x...)
for (y...)
do even stuff
} else {
for (x...)
for (y...)
do odd stuff
}
This cuts down the total number of operations.
Good compilers are supposed to do this optimization
automatically. I don't leave it to chance.
There are also some clever things done to make the code easier to use:
- A template is used for members of a generation. The
map class (CLayout) is given to this template. This code could
be quickly modified to use different objects as members of
the generations.
- The Generation object has an overloaded ++ operator.
This has the obvious effect of iterating to (and thus
creating) the next generation.
Who came up with this idea?
This is original work by Guy T. Schafer. Much of the code
was taken and modified from a program (called Genetic Boggle)
written to encode
specific words into a Boggle grid. That program is available at
professorguy.com/ideas/Boggle/UserGuide.html. It is also
my original work.
Some empirical data.
Since the problem as stated is NP-hard, there is no way to calculate
the optimum solution. However, this problem (with the same data)
has been assigned to many programmers over many years. These
programmers have used various algorithms for generating good maps.
The best solutions ever seen in that time are
The solutions created (in PostScript format)
with my genetic algorithm are
Further, a comparison of the 2 methods in step 5 above reveals that
directed evolution slightly slows each generation because of the
additional checks which must be done. (Remember that step 5 is
performed 105
n times each generation where
n is the
number of map points.) Even so, directed evolution is faster to
reach fair solutions than natural evolution and is less dependent
on the mutator settings in step 6 (since it has a mutator built in).
A surprising result is that natural evolution is much faster to
reach really good solutions. For example, the results of this run
on the 150 town data are typical:
Directed evolution
- 120 clear: 11 generations
- 135 clear: 241,131 generations
|
Natural evolution
- 120 clear: 140 generations
- 135 clear: 2,108 generations
|
These astonishing results required careful adjustment of the mutator
for the natural evolution whereas a wide range of mutation settings
seemed identical to directed evolution. However, the program's slider
control let me tweak this setting to a good value within a few
dozen generations.
Send email to
guy@professorguy.com
© Copyright 2001
Guy T. Schafer
Last updated August, 2011