Riccardo Poli, Mathew Salvaris,
and Caterina Cinel
1School of Computer Science and Electronic Engineering,
University of Essex,
Wivenhoe Park, CO4 3SQ, UK
Many techniques to control bloat have been proposed in the last two decades (for recent reviews see [10,5,15,1]). One of these is the Tarpeian method introduced in [8]. The method is extremely simple in its implementation. All that is needed is a wrapper for the fitness function like the following:
if size(program) average_program_size and random() then
return( );
else
return( fitness(program) );
were is a real number between 0 and 1, random() is a function which returns uniformly distributed random numbers in the range and is a constant which represents an extremely low (or high, if minimising) fitness value such that individuals with such fitness are almost guaranteed not to be selected.
The Tarpeian method has been used in a variety of studies and applications. For example, in [6] its performance and generalisation capabilities were studied, while it was compared with other bloatcontrol techniques in [5,16,1]. The method has been used with success in the evolution of bin packing heuristics [3,2], in the evolution of image analysis operators [13], in artificial financial markets based on GP [7], just to mention a few.
In all cases the Tarpeian method has been a solid and efficient choice. A particularly nice feature of this method is that, because the wrapper does not evaluate the individuals being given a bad fitness, the higher the antibloat intensity, , the faster the algorithm. All studies and applications, however, have had to determine by trial and error the value of the parameter such that the size of programs would not exceed a given range of sizes. In other words, for a new application there was no way of saying a priori what values of corresponded to what values of mean program size.
Recent research [9] has developed a technique, called covariant Tarpeian method, that allows one to dynamically and optimally set the rate of application of the Tarpeian method, , in such a way as to completely control the evolution of the mean program size. The method has excellent properties but also some important drawbacks which we seek to overcome in this paper.
We, therefore, start the paper with a brief review of the covariant Tarpeian method (Section 2). This is followed by our proposed adaptive technique in Section 3. We test the algorithm on a variety of benchmark problems in Section 4. We propose our realworld application in the domain of Brain Computer Interfaces (BCI) in Section 5. We provide our conclusions in Section 6.
The covariant Tarpeian method was inspired by the size evolution equation developed
in [8,12]. As shown
in [11], this can be rewritten as follows
To model the effects on program size of the Tarpeian method in a GP system [9] specialised this equation and derived the following approximation:
With this explicit formulation of the expected size changes, we can find out for what value of we get . By setting the l.h.s. of Equation (2) to 0 and solving for , we obtain:
This equation allows one to determine how often the Tarpeian method should be applied to modify the fitness of aboveaveragesize programs as a function of a small set of descriptors of the current state of the population and of the parameter .
An important limitation of these results is that they can be applied only to generational systems with fitness proportionate selection. Also, they are applicable only to systems with symmetric crossover and no mutation.
While there are many GP systems that fit this particular specification, many more, perhaps most, do not. In particular, the preferred selection strategy in GP is tournament selection (particularly when large populations are used, since it requires a computation load which is linear in the population size, while fitness proportionate selection is quadratic). Also, steady state systems are very frequently used in GP, since they avoid the overhead of storing two generations.
For all these systems, we need a different recipe to adapt .
The recipe we propose to set to achieve this is very simple. We initially set . Then, at each generation, if the mean (or median) size of the individuals in the population is bigger than a usersettable threshold , we increment by an amount , otherwise we decrement by an amount , where is a constant . In these operations, is clipped within the interval . As we will show later, the threshold can be modified from generation to generation to follow a particular function of time, resulting in the size of the programs in the population approximately tracking the same function. However, in most cases .
Let us try to understand the rationale for this algorithm. We will do this by considering a generational system with fitness proportionate selection and we will apply the method to mean program sizes so as to be able to refer to the equations presented in the previous section for explanations. However, similar processes happen also in steady state algorithms, in systems which use forms of selection other than fitness proportionate and for median program sizes.
Since initially , if the user sets to be bigger than the initial mean program size, for as long as the mean program size is below the threshold , remains zero. So, there is no Tarpeian bloat control and size can evolve freely. In a generational system using fitness proportionate selection mean program size will, therefore, obey Equation (2) with , i.e., Equation (1).
In the presence of bloat, at some point will be sufficiently high for sufficiently long to cause the mean program size to surpass the threshold . At this point the algorithm start increasing thereby initially partially and then completely compensating this positive covariance (see numerator of Equation (2)). Normally it will take some generations to reach complete compensation, i.e., for to become zero. By this time the mean program size will have exceeded and therefore the algorithm continues to increase , thereby overcompensating for the covariance and causing the mean program size to start dropping. Since continues to be incremented until the mean program size is below , eventually the mean program size will become less than and in fact will continue to decrease for a while, until is sufficiently small not to compensate for the covariance any more. At this point, in the presence of a positive covariance, the cycle will repeat (typically dampened). Naturally, the frequency of these oscillations will depend on the choice of with the bigger its value the quicker the cycles.
Other than for the aforementioned effect, as we will see, the behaviour of the algorithm is little influenced by . However, we still have one degree of freedom to fix: the parameter . This is not different from other (noncovariant) bloat control methods. In the parsimony pressure method, for example, one needs to fix the parsimony coefficient, while in the original Tarpeian method one had to fix the rate of application of the operator. The difference between such methods and the adaptive Tarpeian method is that with the latter it is possible to precisely prescribe what size programs should have on average at each generations, while for the former the mapping from parameters to sizes is applicationdependent and generally unknown.
In order to extensively test the adaptive Tarpeian method we considered four symbolic regression problems. In three of the problems  the quartic, quintic and sextic polynomial problems  the objective was to evolve a function which fits a polynomial of the form , where is the degree of the polynomial, for in the range . In the fourth problem  the unequalcoefficients quartic polynomial problem  the objective was to fit the polynomial . Fitness cases for all problems were obtained by sampling the corresponding polynomials at the 21 equally spaced points .
We used a treebased GP system implemented in Python with all numerical calculations done using the Numpy library (which is implemented in C). The system uses a steadystate update policy. It evolved a population of either 100 or 1,000 individuals with tournament selection with a tournament size of 5, the grow method with a maximum initial depth of 4, subtree crossover applied with a rate of 0.9 and subtree mutation with a rate of 0.1. Both used a uniform selection of crossover/mutation points. Runs lasted 50 generations unless otherwise stated. The system used the primitive set shown in Table .1.
Primitive  Arity  Functionality 
0  Independent variable  
0  Numerical constants  
,  1  Sine and cosine functions 
*, +,  2  Standard arithmetic operators 
%  2  Protected division. It returns 1 if the denominator is 0; otherwise. 
if  3  If function. If the first argument is nonzero it returns its second argument, otherwise the third. 
We tested the system with and without adaptive Tarpeian bloat control. When the method was used, we tested it for and for a threshold . In each configuration we performed 30 independent runs. The threshold was chosen based on previous experience with polynomial regression.
In Figure .1 we show the evolution of the median (across 30 independent runs) of the bestofgeneration fitness and of the median program size in runs with the Quartic polynomial problem for different values of the adaptive Tarpeian method parameter and for populations of 1,000 individuals. In Figure .2 we show the corresponding results for the Sextic polynomial. For space limitations we do not report the results for the Quintic polynomial and the Quartic with unequal coefficients, but we note that results were qualitatively similar. Results with smaller populations of 100 individuals (not reported) were also similar in terms of program size dynamics, although best fitnesses were naturally significantly worse.
As we can see from Figures .1 and .2 there are no significant differences in the bestofgeneration fitness dynamics between the case where no bloat control is exerted and in the case of adaptive Tarpeian bloat control. This is also confirmed by the success rate figures reported in Table .2.^{3}
However, looking at the median size dynamics in Figures .1 and .2 we see that at all values of tested the method keeps the median size of the individuals in the population under tight control, while in the absence of bloat control we see significant bloat. As a result runs lasted between 10 (for the Quartic polynomial) and 20 (for the Quartic polynomial with unequal coefficients) times more without bloat control. What differs as is varied is the duration of the transient which takes place when the threshold is first hit, with the smallest values of being affected by the largest overshoots in program size.
Population  Problem  No bloat  Adaptive Tarpeian with  
size  polynomial  control  0.05  0.1  0.25  0.5  1.0  
100  Quartic  50.0  33.3  56.7  60.0  43.3  43.3  
100  Quintic  36.7  26.7  23.3  33.3  40.0  23.3  
100  Sextic  46.7  10.0  26.7  30.0  30.0  43.3  
100  Quartic Uneq.  20.0  3.3  0.0  3.3  6.7  10.0  
1000  Quartic  96.7  96.7  96.7  100.0  96.7  96.7  
1000  Quintic  93.3  86.7  96.7  100.0  100.0  96.7  
1000  Sextic  90.0  76.7  100.0  90.0  90.0  100.0  
1000  Quartic Uneq.  66.7  50.0  43.3  60.0  50.0  60.0 
As an example of the ability of the adaptive Tarpeian method to dynamically control the dynamics of program sizes, in Figure .3 we show the median of the median program size recorded in 10 independent runs with the Quintic polynomial problem with adaptive Tarpeian control with . Runs lasted 200 generations. Population size was 1000. Tournament size was 2. All other parameters were the same as in the other polynomial regression problems.
Braincomputer interfaces convert signals generated from the brain into control commands for devices such as computers. Such systems are often based on the analysis of brain electrical activity obtained via electroencephalography (EEG). Within EEGbased BCIs, the analysis of event related potentials (ERPs) has been particularly effective. ERPs are electrical components of brain activity produced in response to external stimuli. Some such ERPs can be used to determine user intentions in BCIs by relating their presence to specific external stimuli.
Given the pointandclick nature of most modern user interfaces, an important application of BCI is controlling 2D pointer movements. Some initial success in developing an ERPbased mouse has been reported in [4] and a variety of alternatives to this scheme were explored in [14]. However, in both cases only semilinear transformations were used to transform EEG signals into mouse movements. These have obvious limitations, particularly in relation to noise and artefact rejection. So, we wondered if GP, with its ability to explore the huge space of computer programs, could produce even more powerful transformations while also performing feature selection and artefact handling at the same time.
Our BCI mouse uses visual displays showing 8 circles arranged around a circle at the centre of the display as in Figure .4(far left). Each circle represents a direction of movement for the mouse cursor. Circles temporarily changed colour  from grey to either red or green  for a fraction of a second. We will call this a flash. The aim was to obtain mouse control by mentally focusing on the flashes of the stimulus representing the desired direction and mentally naming the colour of the flash. Flashes lasted for 100 ms and the interstimulus interval was 0 ms. Stimuli flashed in clockwise order. This meant that the interval between two target events for the protocol was 800 ms. We used a black background, grey neutral stimuli and either red or green flashing stimuli.
Data from 8 participants with an average age of 33 were used. They all had normal or corrected normal vision except for subject 5 who had strabismus. Each session was divided into runs, which we will call direction epochs. Each participant carried out 16 direction epochs, this resulted in the 8 possible directions being carried out twice.
Each direction epoch started with a blank screen and after 2 seconds the eight circles appeared near the centre of the screen. A red arrow then appeared for 1 second pointing to the target (representing the direction for that epoch). Subjects were instructed to mentally name the colour of the flashes for that target. After 2 seconds the flashing of the stimuli started. This stopped after 20 to 24 trials, with a trial consisting of the sequential activation of each of the 8 circles. In other words each direction epoch involves between and flashes. After the direction epoch had been completed, subjects were requested to verbally communicate the colour of the last target flash.
Data were sampled at 2048 Hz from 64 electrode sites using a BioSemi ActiveTwo EEG system. The EEG channels were referenced to the mean of the electrodes placed on either earlobe.
The data from each subject were used to train an ensemble of 2 hardmargin linear SupportVector Machines (SVMs) for that subject. No channel selection was performed, and, so, all 64 channels were used. The data were filtered between 0.15 and 30 Hz and downsampled to 128 Hz. Then, from each channel an 800 ms epoch was extracted which was further decimated to 32 Hz.
The SVM classification results were estimated using 16 fold crossvalidations, with each direction epoch being a crossvalidation fold. After training, the output of each SVM can be interpreted as a degree of targetness of a stimulus as inferred from the feature vector associated with the stimulus.
The data produced by the SVM can be used to control the mouse. If is the angle corresponding to the last flashed stimulus ( for the North direction, for the NorthEast direction, etc.) and is the corresponding score produced by the SVM, it is sufficient to apply a displacement to the mouse cursor to obtain a reasonable trajectory. This is because the SVM tends to produce positive scores for epochs that represent targets and negative values for epochs that represent nontargets. However, the trajectories of the mouse thus produced are very convoluted and suffer enormously from the artifacts produced by swallowing, eye blinks and neck tension (which can easily generate voltages which are one or two orders of magnitude bigger than ordinary EEG signals). Examples of normal and artefactual trajectories produced by the raw SVM output are shown in Figure .5.
While the acquisition of data to train the SVMs of a subject and the training itself require only five minutes or so, making an SVM approach rather interesting, the particularly convoluted nature of the trajectories and the sensitivity of the SVMs to artifacts make the direct control of SVM raw output problematic. We, therefore, thought about using a program evolved by GP to postprocess the output produced by the SVMs and to integrate this with the raw information available from the 64 EEG channels so as to straighten the trajectories produced by the raw SVM output and to make them insensitive to artifacts.
Naturally, GP cannot compete with the speed of SVMs and we cannot imagine subjects waiting for hours while GP finds a solution which suits their particular brain signals. So, the solution evolved by GP has to be general and reusable across subjects and across experiments.
For these experiments we used the same GP system as before, but in this case we switched on its stronglytyped features. Also, since fitness evaluation in this domain of application is extremely computationally intensive, we also switched on its parallel feature which allows one to perform fitness evaluations across multiple CPU cores (via farming).
As before the system uses a steadystate update policy. It evolves a population of 1,000 individuals with tournament selection with a tournament size of 5, a stronglytyped version of the grow method with a maximum initial depth of 4, and stronglytyped versions of subtree crossover and subtree mutation. Both operators are applied with a 50% rate and use a uniform selection of crossover/mutation points. The system uses the primitive set shown in Table .3. Program trees were required to have a Float return type.
Primitive  Output Type  Input Type(s)  Functionality 
0.5, 1, 2, 5, 10, 0, , 25  Float  None  Floating point constants used for numeric calculations and array indexing 
Fp1, AF7, AF3, F1, ... (60 more channels)  Array  None  Returns an array of 26 samples following a flash from one of the channels. The samples are of type Float. 
SVM  Float  None  Raw output of SVM ensemble. 
targetQ1, targetMed, targetQ3, ... (6 more for nontargets and neartargets)  Float  None  First quartile, median and third quartile of the SVM scores for targets, neartargets and nontargets for a subject. 
+, , *, min, max  Float  (Float, Float)  Standard arithmetic operations plus maximum and minimum on floats. 
,  Bool  (Float, Float)  Standard relational operations on floats 
if  Float  (Bool, Float, Float)  Ifthenelse function. If the first argument evaluates to True, then the result of evaluating its second argument is returned. Otherwise the result of evaluating the third argument is returned. 
abs  Float  Float  Returns the absolute value of a Float. 
mean, median, std, Amin, Amax  Float  (Float, Float, Array)  Given a 26sample array and two floats, treat the floats as indices for the array by casting them to integer and applying a modulus 26 operation. Then compute the mean, median, standard deviation, minimum or maximum, respectively, of the samples in the array falling between such indices (inclusive). 
With this setup we performed runs of 50 generations. For Tarpeian we used a threshold and . The threshold was chosen based on the available computational resources and on the need to produce a reasonably analysable result. We did several runs with this system. However, because of the extreme computational load required by our fitness evaluation and the complexity of the problem, here we only report the results of our best run. The run took approximately 6 CPU days to complete.
The fitness function we used measures the dissimilarity between the ideal trajectory and the actual trajectory produced by a program averaged over the direction epochs in our training set. The training set was formed by 80 direction epochs, obtained by taking the 39 most noisy direction epochs out of the 128 available from our subjects and mixing them with 41 less noisy direction epochs. Since each direction epoch consisted in an average of 176 trials and there were 80 such epochs, measuring the fitness of a program required executing the program for over 14,000 times. Being an error measure, fitness is, naturally, minimised in our system. We describe its elements below.
The actual trajectory produced by a program on a training epoch is obtained by iteratively evaluating the program, each time feeding the samples relating to a new flash into the Fp1, AF7, etc. terminals (which effectively act as a sliding window on the EEG) as well as the raw output produced on that epoch by the SVM ensemble. The output of the program, which, as noted above, is of type Float, is multiplied by a unit vector representing the direction corresponding to the stimulus that flashed on the screen. This produces a result of the form which is used as a displacement to be applied to the current mouse position.
As illustrated in Figure .6, the ideal trajectory for each direction epoch is obtained by sampling at regular intervals the line segment connecting the origin to a point along the desired direction. The point is chosen by computing the expected distance reached by the linear SVM trained on the data for subject. The ideal trajectory is sampled in such a way to have the same number of samples as the actual trajectory. The comparison between actual and ideal trajectory is then a matter of measuring the Euclidean distance between pairs of corresponding points in the two trajectories and taking an average. Notice that any detours from the ideal line and any slowdowns in the march along it in the actual trajectory are strongly penalised with our fitness measure.
In order to test the generality of evolved solutions we used a further 80 direction epochs obtained from a separate set of 5 subjects.
Figure .7 shows the dynamics of median and best program's fitness as well as of median and best program's size in our runs. As one can immediately see, essentially there is no bloat, while fitness continues to improve, albeit quite slowly, for the 50 generations of the run.
The best evolved program is presented in tree form in Figure .8. To evaluate its performance we will compare its output to the output produced by the raw SVM data on our independent test set of 5 subjects.
Let us start from a qualitative analysis of the behaviour of this program. Figure .9(top) shows the output produced by the SVM ensemble for each of the direction epochs in the test set of 5 subjects (after the transformation of the scores into displacements). Note how convoluted trajectories are. Clearly while the SVMs get a good portion of the job done, namely turning the EEG signals into trajectories that are in approximately the right direction, they leave much to desire in terms of usability. Figure .9(bottom) shows the corresponding trajectories produced by our best evolved programs. Qualitatively it is clear that these trajectories are far less convoluted. Also, their end points appear to cluster more precisely near the desired directions of motion.

To quantitatively verify these observations, Table .4 shows a statistical comparison between the trajectories produced by GP and those produced by the SVMs in terms of mean, median, standard deviation and standard error of the mean of the distances between the ideal trajectory and the actual trajectory recorded in each of the 80 direction trials in our test set of 5 subjects. As we can see the evolved program produces trajectories which are on average closer to the ideal line than the corresponding trajectories produced by the SVMs. The differences are highly statistically significant as shown by a onesided Wilcoxon signed rank test for paired data (value 0.0009).
Program  Mean  Median  Standard Deviation  Standard Error 
GP  44.3389  41.5974  18.1760  2.0450 
SVM  51.5003  50.9928  21.6596  2.4369 
In this paper we have presented a variation of the Tarpeian method's for bloat control that allows the adaptation of the method main parameter  its rate of application  in such a way to guarantee that programs don't grow bigger than a given threshold. While this threshold is usually a constant, it can also be dynamically varied during evolution resulting in program sizes approximately tracking it.
An earlier method  the covariant Tarpeian method introduced last year [9]  can solve the problem of optimally setting analytically. However, the method is only applicable to generational systems using fitness proportionate selection and relying only on crossover as their genetic operator. While the adaptive Tarpeian method is algorithmic and control of program size is asymptotic, it does not suffer from such limitations as we have demonstrated testing it with a steady state GP system with tournament selection and using a mix of subtree crossover (which is sizeunbiased) and subtree mutation (which is sizebiased).
We tested the method on a variety of standard benchmark problems as well as in a realworld application  the evolution of a Brain Computer Interfaces mouse controller. Results with benchmark problems indicate that the method is very effective at keeping the size of programs under control without incurring any particular penalty in terms of endofrun fitness achieved. Runs, however, lasted only a fraction of the time required without bloat control.^{4} The results in the BCI mouse problem were very good, the system being able to evolve a controller which is significantly better than a solution based on support vector machines. The solution also generalised very well to unseen data.
In this paper we did not directly compare the adaptive Tarpeian method with the nonadaptive version of the method. This is because our main objective was not to show that the former does better than the latter at controlling the growth of program sizes: we believe the original Tarpeian method will also work well when appropriately tuned. Instead, our aim was explore the possibility of automatically adjusting the rate of application of the Tarpeian method, , so that one would have a direct (thought ) rather than indirect (through ) control over program sizes.
Why is this important? The work presented in this paper was originally motivated by the needs for bloat control in a realworld problem  the BCI mouse problem. In a domain such as this, where one cannot afford to perform production runs with different values of until satisfactory control over program sizes is achieved, having a method where on can directly set the maximum program size is essential. We believe many other realworld problems may benefit from this feature of the adaptive Tarpeian method.
We would like to thank Luca Citi and the Engineering and Physical Sciences Research Council (grant EP/F033818/1).