A Genetic Algorithm for Cartographic Label Placement by Guy T. Schafer

top What is Carto?

Carto is a Windows program that uses a genetic algorithm to place labels near point features of a map. The labels have a uniform size (60 x 12 points) and can be placed in only one of 4 positions like this:

The optimal labeling has the fewest number of labels that overlap. Finding the optimal labeling is an NP-hard problem so there is likely no analytical solution which is efficient. Carto uses the power of the genetic algorithm to more quickly converge to a good solution without having to consider all 4n possible labelings--an impossible task.

Carto starts with a set (called a generation) of randomly generated labelings and applies fitness rules to them. The fittest labelings mate and spawn a new generation of labelings. This simple procedure is repeated over and over. As if by magic, a result is obtained that would be very difficult to program using a more conventional analytical approach.

top Where do I get the program?

You can get executable (Windows) and a sample map file. Or get the C++ source code, and test map in a zip file. Please feel free to use any or all of this code for your own projects.

top How do I use the program?

  1. Press the Load Map Coordinates button to read in an array of point features.
  2. Press the Go button to start the process. When started
  3. Press the Stop button to pause the process. When paused
  4. Adjust the mutation rate at any time.

top How does the genetic algorithm work?

Terms:
map is an entire labeling (a member of generation).
mapi is the choice of position (UL, UR, LL, or LR) for the ith point of the map.
  1. A set of points is read from a file, an x value and a y value on each line. This is stored in an array of points.
  2. A generation of 105 maps is created and filled with the points from the array along with random labeling positions (these positions are the "genes").
  3. A fitness test is performed on all members of the generation. The fitness of a position f(mapi) is 1 if it is NOT overlapped by any other label, otherwise it is 0. The fitness of a map is equal to the sum of the fitnesses of all the points.
    f(map) = Sumi f(mapi). That is, the fitness of a map is the number of labels which are NOT overlapped.
  4. The 15 most fit grids are chosen. If f(map) > f(bestmap) then
    1. bestmap = map
    2. Display map.
  5. The parents are paired in every possible combination to produce 105 children (15 choose 2 = 105). The child is given a position for mapi by one of the following methods (depending on an #ifdef in the code):
    Directed evolution
    1. If f(momi) = 1 and f(dadi) = 0
      then childi = momi.
    2. If f(momi) = 0 and f(dadi) = 1
      then childi = dadi.
    3. If f(momi) = 1 and f(dadi) = 1
      then childi is set randomly to either momi or dadi
    4. (Crossover)
    5. If f(momi) = 0 and f(dadi) = 0
      then childi is set randomly to any one of the 4 possible positions
    6. (Mutator)
    Natural evolution

    childi is set randomly to either momi or dadi

  6. Apply an additional mutator to drive the map away from local maxima:
    Loop 0 to mutation setting
    {
      randomly choose a map
      randomly choose a position mapi
      set mapi to a random position.
    }
    Mutation setting can be adjusted from 0 to 20n where n = the number of points. The highest setting was determined empirically to be deadly--f(map) quickly plummets and no improvement occurs.
  7. Repeat steps 3 - 7 until the user hits the Stop button or all labels are not overlapped.

top Does the program do anything clever?

Once you read and understand the algorithm above it is easy to believe the program must have been written by chimps. I prefer to think of its simplicity as "elegance" rather than "idiocy." The proof is, of course, that this scheme does (miraculously) work.
There are a few clever things done in the design of the program to make each generation proceed more quickly. The reason is simple. This program is generally massively compute bound. It runs for days or weeks. Any savings in time is multiplied by millions of iterations. There are also some clever things done to make the code easier to use:

top Who came up with this idea?

This is original work by Guy T. Schafer. Much of the code was taken and modified from a program (called Genetic Boggle) written to encode specific words into a Boggle grid. That program is available at professorguy.com/ideas/Boggle/UserGuide.html. It is also my original work.

top Some empirical data.

Since the problem as stated is NP-hard, there is no way to calculate the optimum solution. However, this problem (with the same data) has been assigned to many programmers over many years. These programmers have used various algorithms for generating good maps. The best solutions ever seen in that time are The solutions created (in PostScript format) with my genetic algorithm are Further, a comparison of the 2 methods in step 5 above reveals that directed evolution slightly slows each generation because of the additional checks which must be done. (Remember that step 5 is performed 105n times each generation where n is the number of map points.) Even so, directed evolution is faster to reach fair solutions than natural evolution and is less dependent on the mutator settings in step 6 (since it has a mutator built in).

A surprising result is that natural evolution is much faster to reach really good solutions. For example, the results of this run on the 150 town data are typical:

Directed evolution
  • 120 clear: 11 generations
  • 135 clear: 241,131 generations
Natural evolution
  • 120 clear: 140 generations
  • 135 clear: 2,108 generations

These astonishing results required careful adjustment of the mutator for the natural evolution whereas a wide range of mutation settings seemed identical to directed evolution. However, the program's slider control let me tweak this setting to a good value within a few dozen generations.

top


Send email to guy@professorguy.com

© Copyright 2001 Guy T. Schafer
Last updated August, 2011