TinyGP (also known as Tiny GP) - A Tiny Genetic Programming Implementation

Genetic programming (GP) is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions.

TinyGP is a highly optimised GP system that was originally developed by Riccardo Poli to meet the specifications set out in the TinyGP competition of the Genetic and Evolutionary Computation Conference (GECCO) 2004. We include it as a working example of a real GP system, to show that GP software tools are not necessarily big, complex and difficult to understand. The system can be used as is or can be modified or extended for a user's specific applications. Furthermore, TinyGP may serve as a guide to other implementation of genetic programming.

The source code of TinyGP is available here. The following section provides a description of the main characteristics of TinyGP. Section 2 describes the format for the input files for TinyGP. Section 3 provides further details on the implementation and the source code for a Java version of TinyGP. Finally, Section 4 describes a sample run of the system.

TinyGP is also described in the free book A Field Guide to Genetic Programming by Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. Available as a free PDF, or in printed form from Lulu.com.

The book's web site and blog (see below) also lists a couple of bugs found in the original version of TinyGP. All have been fixed in the version available from this site.


1. Overview of TinyGP

TinyGP is a symbolic regression system with the following characteristics:

2 Input Data Files for TinyGP

The input files for TinyGP have the following plain ASCII format:
HEADER
// See below
FITNESSCASE1 // The fitness cases (one per line )
FITNESSCASE2
FITNESSCASE3
....

Each fitness case is of the form X1 ... XN TARGET
where X1 to XN represent a set of input values for a program, while TARGET represents the desired output for the given inputs.

The header has the following entries
NVAR NRAND MIN_RAND MAX_RAND NFITCASES
where NVAR is an integer representing the number of variables the system should use, NRAND is an integer representing the number of random constants to be provided in the primitive set, MIN_RAND is a float representing the lower limit of the range used to generate random constants, MAX_RAND is the corresponding upper limit, and NFITCASES is an integer representing the number of fitness cases. NRAND can be set to 0, in which case MIN_RAND and MAX_RAND are ignored.

For example:
1 100 -5 5 63
0.0 0
0.1 0.0998334166468282
0.2 0.198669330795061
0.3 0.29552020666134
....
55 LINES OMITTED
....
5.9 -0.373876664830236
6.0 -0.279415498198926
6.1 -0.182162504272095
6.2 -0.0830894028174964

These fitness cases are sin(x) for x=0.0,0.1,0.2,...,6.2. The full file is available here.

3 Source Code

The source code of TinyGP is available here. The original TinyGP system was implemented, in the C programming language, to maximise efficiency and minimise the size of the executable. The version presented here is a Java re-implementation of TinyGP. The original version did not allow the use of random numerical constants.

How does TinyGP work? The system is based on the standard flattened (linear) representation for trees, which effectively corresponds to listing the primitives in prefix notation but without any brackets. Each primitive occupies one byte. A program is simply a vector of characters. The parameters of the system are as specified in Section 1. They are fixed at compile time through a series of static class variables assignments. The operators used are subtree crossover and point mutation. The selection of the crossover points is performed at random with uniform probability. The primitive set and fitness function are as indicated above. The code uses recursion for the creation of the initial population (grow), for the identification of the subtree rooted at a particular crossover point (traverse), for program execution (run), and for printing programs (print indiv). A small number of global variables have been used. For example, the variable program is a program counter used during the recursive interpretation of programs, which is automatically incremented every time a primitive is evaluated. Although using global variables is normally considered bad programming practice, this was done purposely, after extensive experimentation, to reduce the executable's size. The code reads command line arguments using the standard args array.

Generally the code is quite standard and should be self-explanatory for anyone who can program in Java, whether or not they have implemented a GP system before. Therefore very few comments have been provided in the source code. The source is provided below. The program should be compiled with the command
javac -O tiny gp.java

4 A Sample Run of Tiny GP

We ran TinyGP on the sin(x) dataset described in Section 2. If the dataset is stored in a file problem.dat, the program can simply be launched with the command java tiny gp. Otherwise, the user can specify the datafile on the command line, by giving the command
java tiny gp SEED FILE
where SEED is the seed for the random number generator and FILE is the dataset file name.
The output produced by the program is something like the following
-- TINY GP (Java version) --
SEED=-1
MAX_LEN=10000
POPSIZE=100000
DEPTH=5
CROSSOVER_PROB=0.9
PMUT_PER_NODE=0.05
MIN_RANDOM=-5.0
MAX_RANDOM=5.0
GENERATIONS=100
TSIZE=2
----------------------------------
Generation=0 AvgFitness = 42.53760218120066 BestFitness=39.997953686554816 AvgSize=10.9804
BestIndividual:
( 1.589816334458055 / -2.128280559500907 )
Generatio n=1 AvgFitness = 1226.404415960088 BestFitness=24.441994244449372 AvgSize=10.97024
BestIndividual:
( ( ( - 0.3839867944222206 / -2.2796127162428403 ) + ( -1.8386812853617673 / -1.06553859601892 ) ) - ( ( ( 4.984026635222818 * ( 0.17196413319878445 - 0.1294044215655923 ) ) + ( X1 - -1.8956001614031734 ) ) * 0.3627020733460027 ) )
...

Page maintained by Riccardo Poli
Department of Computing and Electronic Systems
University of Essex
Wivenhoe Park, Colchester, CO4 3SQ, UK
Telephone: +44-1206-872338
Fax: +44-1206-872788

E-mail: rpoli @ essex . ac . uk

Last modified: Thu Feb 28 16:04:54 GMT 2008