www.gp-field-guide.org.uk
Contents Top Previous Next

4.2 Step-by-Step Sample Run

Now that we have performed the five preparatory steps, the run of GP can be launched. The GP setup is summarised in Tableá 4.1 .


Tableá4.1: Parameters for example genetic programming run


Objective:

Find program whose output matches x2 + x + 1 over the range -1 ú x ú +1.

Function set:

+, -, % (protected division), and Î; all operating on floats

Terminal set:

x, and constants chosen randomly between -5 and +5

Fitness:

sum of absolute errors for x{-1.0 ,-0.9 ,...0.9 ,1.0}

Selection:

fitness proportionate (roulette wheel) non elitist

Initial pop:

ramped half-and-half (depth 1 to 2. 50% of terminals are constants)

Parameters:

population sizeá4, 50% subtree crossover, 25%áreproduction, 25%ásubtree mutation, no tree size limits

Termination:

Individual with fitness better than 0.1 found




4.2.1 Initialisation

GP starts by randomly creating a population of four individual computer programs. The four programs are shown in Figureá 4.1 in the form of trees.

The first randomly constructed program tree (Figureá 4.1 a) is equivalent to the expression x + 1. The second program (Figureá 4.1 b) adds the constant terminal 1 to the result of multiplying x by x and is equivalent to x2+1. The third program (Figureá 4.1 c) adds the constant terminal 2 to the constant terminal 0 and is equivalent to the constant value 2. The fourth program (Figureá 4.1 d) is equivalent to x.


PIC

Figureá4.1: Initial population of four randomly created individuals of generationá0.



PIC

Figureá4.2: Graphs of the evolved functions from generation 0. The solid line in each plot is the target function x2 + x + 1, with the dashed line being the evolved functions from the first generation (see Figureá 4.1 ). The fitness of each of the four randomly created individuals of generation 0 is approximately proportional to the area between two curves, with the actual fitness values being 7.7, 11.0, 17.98 and 28.7 for individuals (a) through (d), respectively.


4.2.2 Fitness Evaluation

Randomly created computer programs will typically be very poor at solving any problem. However, even in a population of randomly created programs, some programs are better than others. The four random individuals from generation 0 in Figureá 4.1 produce outputs that deviate by different amounts from the target function x2 + x + 1. Figureá 4.2 compares the plots of each of the four individuals in Figureá 4.1 and the target quadratic function x2 + x + 1. The sum of absolute errors for the straight line x+1 (the first individual) is 7.7 (Figureá 4.2 a). The sum of absolute errors for the parabola x2+1 (the second individual) is 11.0 (Figureá 4.2 b). The sums of the absolute errors for the remaining two individuals are 17.98 (Figureá 4.2 c) and 28.7 (Figureá 4.2 d).

As can be seen in Figureá 4.2 , the straight line x+1 (Figureá 4.2 a) is closer to the parabola x2 + x + 1 in the range from -1 to +1 than any of three other programs in the population. This straight line is, of course, not equivalent to the parabola x2 + x + 1; it is not even a quadratic function. It is merely the best candidate that happened to emerge from the blind (and very limited) random search of generation 0.

In the valley of the blind,
the one-eyed man is king.

4.2.3 Selection, Crossover and Mutation

After the fitness of each individual in the population is found, GP then probabilistically selects the fitter programs from the population to act as the parents of the next generation. The genetic operations are applied to the selected individuals to create offspring programs. The important point is that our selection process is not greedy. Individuals that are known to be inferior still have some chance of being selected. The best individual in the population is not guaranteed to be selected and the worst individual in the population will not necessarily be excluded.

In this example we will start with the reproduction operation. Because the first individual (Figureá 4.1 a) is the most fit individual in the population, it is very likely to be selected to participate in a genetic operation. Let us suppose that this particular individual is, in fact, selected for reproduction. If so, it is copied, without alteration, into the next generation (generation 1). It is shown in Figureá 4.3 a as part of the population of the new generation.


PIC

Figureá4.3: Population of generation 1 (after one reproduction, one mutation, and two one-offspring crossover operations).


We next perform the mutation operation. Because selection is probabilistic, it is possible that the third best individual in the population (Figureá 4.1 c) is selected. One of the three nodes of this individual is then randomly picked as the site for the mutation. In this example, the constant terminal 2 is picked as the mutation site. This program is then randomly mutated by deleting the entire subtree rooted at the picked point (in this case, just the constant terminal 2) and inserting a subtree that is randomly constructed in the same way that the individuals of the initial random population were originally created. In this particular instance, the randomly grown subtree computes x divided by x using the protected division operation %. The resulting individual is shown in Figureá 4.3 b. This particular mutation changes the original individual from one having a constant value of 2 into one having a constant value of 1, improving its fitness from 17.98 to 11.0.

Finally, we use the crossover operation to generate our final two individuals for the next generation. Because the first and second individuals in generation 0 are both relatively fit, they are likely to be selected to participate in crossover. However, selection can always pick suboptimal individuals. So, let us assume that in our first application of crossover the pair of selected parents is composed of the above-average tree in Figuresá 4.1 a and the below-average tree in Figureá 4.1 d. One point of the first parent, namely the + function in Figureá 4.1 a, is randomly picked as the crossover point for the first parent. One point of the second parent, namely the leftmost terminal x in Figureá 4.1 d, is randomly picked as the crossover point for the second parent. The crossover operation is then performed on the two parents. The offspring (Figureá 4.3 c) is equivalent to x and is not particularly noteworthy.

Let us now assume, that in our second application of crossover, selection chooses the two most fit individuals as parents: the individual in Figureá 4.1 b as the first parent, and the individual in Figureá 4.1 a as the second. Let us further imagine that crossover picks the leftmost terminal x in Figureá 4.1 b as a crossover point for the first parent, and the + function in Figureá 4.1 a as the crossover point for the second parent. Now the offspring (Figureá 4.3 d) is equivalent to x2 + x + 1 and has a fitness (sum of absolute errors) of zero.

4.2.4 Termination and Solution Designation

Because the fitness of the individual in Figureá 4.3 d is below 0.1, the termination criterion for the run is satisfied and the run is automatically terminated. This best-so-far individual (Figureá 4.3 d) is then designated as the result of the run.

Note that the best-of-run individual (Figureá 4.3 d) incorporates a good trait (the quadratic term x2) from the first parent (Figureá 4.1 b) with two other good traits (the linear term x and constant term of 1) from the second parent (Figureá 4.1 a). The crossover operation thus produced a solution to this problem by recombining good traits from these two relatively fit parents into a superior (indeed, perfect) offspring.

This is, obviously, a highly simplified example, and the dynamics of a real GP run are typically far more complex than what is presented here. Also, in general, there is no guarantee that an exact solution like this will be found by GP.


www.gp-field-guide.org.uk
Contents Top Previous Next