www.gp-field-guide.org.uk
Contents Top Previous Next

## B.1 Overview of TinyGP

TinyGP is a symbolic regression system with the following characteristics:

1. The terminal set includes a user-definable number of floating point variables (named X1 to XN).
2. The function set includes multiplication, protected division, subtraction and addition.
3. The fitness cases are read from a file (the format is given below).
4. The system is steady state. A "generation" is considered concluded when POPSIZE (see below) crossover/mutation events have been performed.
5. Selection is performed using tournament selection.
6. Negative tournaments are used for the selection of the individuals to be replaced at each steady-state-GP iteration.
7. Subtree crossover is used. The selection of crossover points is uniform, so every node is chosen equally likely.
8. Point mutation is used. That is, points (nodes) in the tree are randomly chosen. If a point is a terminal, then it is replaced by another randomly chosen terminal. If it is a function, then it is replaced by another randomly chosen function with the same number of inputs.
9. The following parameters are implemented as static class variables:
• The maximum length any GP program can take: MAX_LEN.
• The size of the population: POPSIZE.
• The maximum depth initial programs can have: DEPTH. Note 0 represents the depth of programs containing just one terminal.
• The maximum number of generations allowed for a run: GENERATIONS.
• The probability of creating new individuals via crossover: CROSSOVER_PROB. The mutation probability is 1 - CROSSOVER_PROB.
• The mutation probability (per node) when point mutation is chosen as the variation operator: PMUT_PER_NODE.
• The tournament size: TSIZE.

10. The parameters and the random seed are printed when each run starts.
11. The fitness function is minus the sum of the absolute differences between the actual program output and the desired output for each fitness case. TinyGP maximises it.
12. The grow initialisation method is used to create the initial population.
13. At each generation the following statistics are calculated and printed:
• The generation number.
• The average fitness of the individuals in the population.
• The fitness of the best individual in the population.
• The average size of the programs in the current generation.
• The best individual in the population.

14. The random number generator can be seeded via the command line. If this command line parameter is absent, the system uses the current time to seed the random number generator.
15. The name of the file containing the fitness cases can be passed to the system via the command line. If the command line parameter is absent, the system assumes the data are stored in the current directory in a file called "problem.dat".
16. If the total error made by the best program goes below 10-5 TinyGP prints a message indicating success and stops. If the problem has not been solved after the maximum number of generations, it prints a message indicating failure and stops.

www.gp-field-guide.org.uk
Contents Top Previous Next