Genetic programming (GP) is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions.
TinyGP is a highly optimised GP system that was originally developed by Riccardo Poli to meet the specifications set out in the TinyGP competition of the Genetic and Evolutionary Computation Conference (GECCO) 2004. We include it as a working example of a real GP system, to show that GP software tools are not necessarily big, complex and difficult to understand. The system can be used as is or can be modified or extended for a user's specific applications. Furthermore, TinyGP may serve as a guide to other implementation of genetic programming.The source code of TinyGP is available here. The following section provides a description of the main characteristics of TinyGP. Section 2 describes the format for the input files for TinyGP. Section 3 provides further details on the implementation and the source code for a Java version of TinyGP. Finally, Section 4 describes a sample run of the system.
TinyGP is also described in the free book A Field Guide to Genetic Programming by Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. Available as a free PDF, or in printed form from Lulu.com.
The book's web site and blog (see below) also lists a couple of bugs found in the original version of TinyGP. All have been fixed in the version available from this site.
POPSIZE(see below) crossover/mutation events have been performed.
DEPTH. Note 0 represents the depth of programs containing just one terminal.
// See below
FITNESSCASE1 // The fitness cases (one per line )
Each fitness case is of the form
X1 ... XN TARGET
XN represent a set of input
values for a program, while
TARGET represents the desired
output for the given inputs.
The header has the following entries
NVAR NRAND MIN_RAND MAX_RAND NFITCASES
NVAR is an integer representing the number of
variables the system should use,
NRAND is an integer
representing the number of random constants to be provided in the
MIN_RAND is a float representing the lower
limit of the range used to generate random constants,
MAX_RAND is the corresponding upper limit, and
NFITCASES is an integer representing the number of
NRAND can be set to 0, in which case
MAX_RAND are ignored.
1 100 -5 5 63
55 LINES OMITTED
x=0.0,0.1,0.2,...,6.2. The full file is available here.
The source code of TinyGP is available here. The original TinyGP system was implemented, in the C programming language, to maximise efficiency and minimise the size of the executable. The version presented here is a Java re-implementation of TinyGP. The original version did not allow the use of random numerical constants.
How does TinyGP work? The system is based on the standard flattened (linear) representation for trees, which effectively corresponds to listing the primitives in prefix notation but without any brackets. Each primitive occupies one byte. A program is simply a vector of characters. The parameters of the system are as specified in Section 1. They are fixed at compile time through a series of static class variables assignments. The operators used are subtree crossover and point mutation. The selection of the crossover points is performed at random with uniform probability. The primitive set and fitness function are as indicated above. The code uses recursion for the creation of the initial population (grow), for the identification of the subtree rooted at a particular crossover point (traverse), for program execution (run), and for printing programs (print indiv). A small number of global variables have been used. For example, the variable program is a program counter used during the recursive interpretation of programs, which is automatically incremented every time a primitive is evaluated. Although using global variables is normally considered bad programming practice, this was done purposely, after extensive experimentation, to reduce the executable's size. The code reads command line arguments using the standard args array.Generally the code is quite standard and should be self-explanatory for anyone who can program in Java, whether or not they have implemented a GP system before. Therefore very few comments have been provided in the source code. The source is provided below. The program should be compiled with the command
javac -O tiny gp.java
sin(x)dataset described in Section 2. If the dataset is stored in a file problem.dat, the program can simply be launched with the command java tiny gp. Otherwise, the user can specify the datafile on the command line, by giving the command
java tiny gp SEED FILE
SEEDis the seed for the random number generator and
FILEis the dataset file name.
-- TINY GP (Java version) --
Generation=0 AvgFitness = 42.53760218120066 BestFitness=39.997953686554816 AvgSize=10.9804
( 1.589816334458055 / -2.128280559500907 )
Generatio n=1 AvgFitness = 1226.404415960088 BestFitness=24.441994244449372 AvgSize=10.97024
( ( ( - 0.3839867944222206 / -2.2796127162428403 ) + ( -1.8386812853617673 / -1.06553859601892 ) ) - ( ( ( 4.984026635222818 * ( 0.17196413319878445 - 0.1294044215655923 ) ) + ( X1 - -1.8956001614031734 ) ) * 0.3627020733460027 ) )
rpoli @ essex . ac