Luca Citi, Riccardo Poli,
and Caterina Cinel
School of Computer Science and Electronic Engineering,
University of Essex,
Wivenhoe Park, CO4 3SQ, UK
The electrical activity of the brain is typically recorded from the scalp using ElectroencephalographyElectroencephalography (EEG). This is used in electrophysiology, in psychology, as well as in BrainComputer InterfaceBrainComputer Interface (BCI) research. Particularly important for these purposes are Eventrelated potentialsEventRelated Potentials (ERPs). ERPs are relatively well defined shapewise variations to the ongoing EEG elicited by a stimulus and/or temporally linked to it [19]. ERPs include early exogenous responses, due to the primary processing of the stimulus, as well as later endogenous responses, which are a reflection of higher cognitive processing induced by the stimulus [9].
While the study of singletrial ERPs has been considered of great importance since the early days of ERP analysis, in practice the presence of noise and artifacts has forced researchers to make use of Averagingaveraging as part of their standard investigation methodology [10]. Even today, despite enormous advances in acquisition devices and signalprocessing equipment and techniques, ERP averaging is still ubiquitous [14,19].
ERP averaging is also a key element in many BCIs. BCIs measure specific signals of brain activity intentionally and unintentionally induced by the participant and translate them into device control signals (see, for example, [12,31,21,1,30,13]). Averaging is frequently used to increase accuracy in BCIs where the objective is to determine which of the stimuli sequentially presented to a user is attended. This is achieved via the classification of the ERP components elicited by the stimuli. This form of BCI  which effectively started off with the seminal work of [12] who showed that it was possible to spell words through the detection of P300 waves  is now one of the most promising areas of the discipline (e.g., see [3,25,7]).
Averaging has empirically been shown to improve the accuracy in ERPbased BCIs. However, the larger the number of trials that need to be averaged, the longer it takes for the system to produce a decision. So, only a limited number of trials can be averaged before a decision has to be taken. A limitation on the number of trials one can average is also present in Psychophysiologypsychophysiological studies based on ERPs: the larger the number of trials that are accumulated in an average, the longer an experiment will last, potentially leading to participants fatiguing, to increases in noise due to variations in electrode impedances, etc. So, both in psychophysiological studies and in BCIs it would be advantageous to make the absolute best use of all the information available in each trial. However, as we will discuss in Section 2, standard averaging techniques do not achieve this.
In recent work [22] we proposed, tested and theoretically analysed an extremely simple technique which can be used in forcedchoice experiments. In such experiments response times are measured via a button press or a mouse click. Our technique consists of Binningbinning trials based on response times and then averaging. This can significantly alleviate the problems of other averaging methods, particularly when response times are relatively long. In particular, results indicated that the method produces clearer representations of ERP components than standard averaging, revealing finer details of components and helping in the evaluation of the true amplitude and latency of ERP waves.
The technique relies on dividing an ERP dataset into bins. The size and position of these bins is extremely important in determining the fidelity with which bin averages represent true brain waves. In [22] we simply used standard (mutually exclusive) bins. That is, each bin covered a particular range of response times, the ranges associated to different bins did not overlap and no gaps were allowed between the bins. As we will explain in Section 3, this implies that, in bin averages, true ERP components are distorted via the convolution with a kernel whose frequency response is itself a convolution between the frequency response of the original latency distribution and the Fourier transform of a rectangular window (a function).
While provably this has the effect of improving the resolution with which ERPs can be recovered via averages, it is clear that the convolution with sinc will produce distortions due to the Gibbs phenomenon. Also, the width and position of the bins we used in [22] was determined heuristically. We chose bins as follows: one gathered the lower 30% of the response time distribution, one the middle 30% and one the longer 30%.^{2} However, it is clear that neither the choice of crisp mutually exclusive membership functions for bins (leading to convolution with sinc) nor the position and width of the bins is optimal.
So, although our binning method is a marked improvement over traditional techniques, it still does not make the best use of the information available in an ERP dataset. It is arguable, for example, that doing binning using gradual membership functions would provide even better ERP reconstruction fidelity. Similarly, setting the size of the bins on the basis of the noise in the data and the particular shape of the response time distribution would be beneficial to make best use of the available trials. Finding bin membership functions which satisfy these criteria, however, is difficult. It is also difficult to specify what notion of optimality one should use. In this paper we solve both problems.
The paper is organised as follows. After the reviews of previous work provided in Sections 2 and 3, we define what an optimal set of binning functions is (Section 4). As we will see this involves the use of statistical tests on the data belonging to different bins. Then (Section 5), we apply Genetic Programming [23] to the task of identifying optimal membership functions for bins in such a way as to get the best possible reconstruction of real ERP components from bin averages. The results of this process, as described in Section 6, provide significant improvements over the original technique. We give some conclusions and indications of future work in Section 7.
There are essentially three classes of methods that are commonly used to resolve ERP components via averaging. Stimuluslocked averagingAveraging, stimuluslocked requires extracting epochs of fixed duration from the EEG signal starting at the stimulus presentation and averaging the corresponding ERPs [17]. An important problem with this form of averaging is that any ERP components whose latency is not phaselocked with the presentation of the stimuli may be significantly distorted as a result of averaging [26,19]. This is because the average, , of randomly shifted versions of a waveform, , is the convolution between the original waveform and the latency distribution, , for that waveform, i.e., , e.g., see [32]. This typically means that a stimuluslocked average can only show a smoothed (lowpass filtered) version of each variablelatency component.
The problem is particularly severe when the task a subject needs to perform after the presentation of the stimuli is relatively difficult since the variability in the latencies of endogenous ERP components and in response times increase with the complexity of the task [19,24]. In these cases, multiple endogenous variablelatency components may appear as a single large blurred component in the average; a synthetic example is shown in Figure .1 (left).^{3}This makes it very difficult to infer true brain area activity for any response occurring after the early exogenous potentials typically elicited by (and synchronised with) a stimulus.

In experiments in which the task requires participants to provide a specific behavioural response (e.g., in the form of a button press or a spoken response), responselocked averagingAveraging, responselocked can be used as an alternative to stimuluslocked averaging to help resolve variablelatency ERP components that are synchronised with the response; see, for example, [18,16,26,27]. In this case, however, the early responses associated and phaselocked with the stimulus will end up being blurred and hard to distinguish, since they are represented in the average by the convolution of their true waveform with the responsetime distribution; see [32]. A synthetic example illustrating this problem is shown in Figure .1 (right).
Thus, inferring whether a component in an average represents a true effect or it is due to averaging biases can then be very difficult. Note that averaging more data does not help increase the fidelity of the reconstructed signals because there is a systematic error in the averaging process.Averaging, errors
A third alternative to resolve variablelatency waves is to attempt to identify such components in each trial and estimate their latency. Then, shifting trials on the basis of estimated latencies and averaging may bring out the desired component from its noise background. However, most methods of this kind require prior knowledge of what type of component to expect and at what times. What if this knowledge is not available? Without this information automated detection algorithms have very little hope of finding the latency of the waves of interest. Also, latency detection algorithms assume that the component of interest is present in every trial and we just need to find its latency in the trial. What if an ERP component is not always elicited by the stimuli? The presence of a component might be, for example, conditiondependent, or dependent on whether or not a participant attended a stimulus, whether a participant was rested or tired, whether there was habituation to the stimuli, etc. [2,28]. If a component was absent frequently, running a latencymeasuring algorithm on trials where the component did not occur would inundate the averaging process with bias and noise. And, unfortunately, thresholds or even more sophisticated algorithms for the detection of the presence of the component, which in principle could be used to properly handle trials that do not contain it, produce large numbers of misclassification errors. So, the composition of detection errors with latencyestimation errors may render componentlocked averaging very unreliable in many situations.
Note also that all methods that realign trials based on component latencies can potentially suffer from a clearcentre/blurredsurround problem. That is, after shifting trials based on the latency of a particular ERP component, all instances of that component will be synchronised, thereby effectively becoming fixedlatency elements. However, stimuluslocked components will now become variablelatency components. Also, all (other) components that are phasedlocked with some other event (e.g., the response), but not with the component of interest, will remain variablelatency. Not surprisingly, then, they will appear blurred and distorted in a componentlocked average.
It is clear that the standard averaging techniques reviewed above are not entirely satisfactory and that a more precise and direct way of identifying variablelatency components as well as measuring their latency and amplitude is needed. In the following section we describe the binning technique we developed in [22], which significantly improves on previous methods.
In [22] we proposed an extremely simple technique  binning trials based on their recorded response time and then applying averaging to the bins. This has the potential of solving the problems with the three main ways of performing averages (stimuluslocked, componentlocked and responselocked) discussed above, effectively reconciling the three methods. In particular, responsetime binning allows one to significantly improve the resolution with which variablelatency waves can be recovered via averaging, even if they are distant from the stimuluspresentation and response times. The reason for this is simple to understand from a qualitative point of view.
The idea is that if one selects out of a dataset all those epochs where a participant was presented with qualitatively identical stimuli and gave the same response within approximately the same amount of time, it is reasonable to assume that similar internal processes will have taken place. So, within those trials, ERP components that would normally have a widely variable latency might be expected to, instead, present a much narrower latency distribution, i.e., they should occur at approximately the same time in the selected subset of trials. Thus, if we bin epochs on the basis of stimuli, responses and response times, we would then find that, for the epochs within a bin, the stimulus, the response, as well as fixed and variablelatency components are much more synchronised than if one did not divide the dataset. Averaging such epochs should, therefore, allow the rejection of noise while at the same time reducing the undesirable distortions and blurring (the systematic errors) associated with averaging. Responsetime binning and averaging should thus result in the production of clearer, less biased descriptions of the activity which really takes place in the brain in response to the stimuli without the need for prior knowledge of the phenomena taking place and ERP components present in the EEG recordings.
In [22] we assessed the binning technique both empirically and theoretically. For empirical validation we modified and used an experiment originally designed by [11] in which the task was relatively difficult, since target detection is based on specific combinations of multiple features (i.e., requiring feature binding), and where response times varied from around 400ms to over 2 seconds. We evaluated the empirical results in a number of ways, including: (a) a comparison between stimuluslocked and responselocked averages which showed how these are essentially identical under responsetime binning; (b) an analysis of differences between bin means, medians and quartiles of the amplitude distributions and an analysis of statistical significance of amplitude differences using KolmogorovSmirnov tests which showed that bins indeed captured statistically different ERP components; and (c) an analysis of the signaltonoise ratio (SNR) with and without binning which showed that the (expected) drop in SNR due to the smaller dataset cardinality associated with bins is largely compensated by a corresponding increase due to the reduction in systematic errors.
From the theoretical point of view, we provided a comprehensive analysis of the resolution of averages with and without binning, which showed that there are resolution benefits in applying responsetime binning even when there is still a substantial variability in the latency of variablelatency components after responsetime binning. We summarise this analysis below since this is the starting point for our fitness function, as we will show in Section 4.
Let us assume that there are three additive components in the ERPs recorded in a forcedchoice experiment  a stimuluslocked component, , a responselocked component, , and a variablelatency component, . Let be a stochastic variable representing the response time in a trial and let be its density function. Similarly, let be a stochastic variable representing the latency of the component and let be the corresponding density function. Let us further assume that response time and latency do not affect the shape of these components. Under these assumptions we obtain the following equation for the stimuluslocked average :
Let us consider the most general conditions possible. Let and be described by an unspecified joint density function . So, the latency and responsetime distributions are marginals of this joint distribution:
In [22] we showed that if one considers a classical ``rectangular'' bin collecting the subset of the trials having response times in the interval , i.e., such that , the joint distribution of and transforms into
We also showed that taking the marginal of this distribution w.r.t. gives us the response time distribution for responsetime bin :
The key difference between and is that, apart from a scaling factor, is the product of and a rectangular windowing function, . In the frequency domain, therefore, the spectrum of , which we denote with , is the convolution between the spectrum of , denoted as , and the spectrum of a translated rectangle, . This is a scaled and rotated (in the complex plane) version of the sinc function (i.e., it behaves like ). The function has a large central lobe whose width is inversely proportional to the bin width . Thus, when convolved with , behaves as a low pass filter. As a result, is a smoothed and enlarged version of . In other words, while is still a lowpass filter, it has a higher cutoff frequency than . So, provides a less blurred representation of than .
We will modify this analysis in the next section for the purpose of defining a suitable fitness measure the optimisation of which would lead to maximising the statistical significance with which ERP components can be reconstructed via binning and averaging.
As described in the previous section, in [22] we used the function to bin trials. To get the best out of the binning technique, here we will replace this function with a probabilistic membership function which gives the probability that a trial characterised by a response time would be accepted in a particular bin . Let us denote this probabilistic membership function as
Let us denote with a binary stochastic variable the event {accept trial for averaging in bin }. Let be the joint distribution of the events , and . This can be decomposed as follows
Focusing our attention on the subset of the trials falling within bin , we obtain the following joint distribution of and
Naturally, one will generally use multiple probabilistic responsetime bins for the purpose of analysing ERP trials. For each, a membership function must be defined. Our objective is to use GP to discover these membership functions in such a way as to maximise the information extracted from the raw data. To do so, we need to define an appropriate fitness function.
While we form bins based on response times, each data element in a bin actually represents a fragment of EEG signal recorded at some electrode site. The question we need to ask is: what do we mean by extracting maximal information from these data? Naturally, alternative definitions are possible. Here we want to focus on the getting ERP averages which are maximally significantly different.
An ERP bin average, , is effectively a vector, each element of which is the signal amplitude recorded at a particular time after stimulus presentation averaged over all the trials in a bin. Because we use probabilistic membership functions for the bins, the composition of a bin is in fact a stochastic variable. Let us denote the stochastic variable representing bin with . The probability distribution of is determined by the membership function and by the response time distribution . An instantiation of , , is effectively an array with as many rows as there are trials in bin and as many columns as there are time steps in each epoch. An element in represents the voltage amplitude recorded in a particular trial and in a particular time step in that trial at the chosen electrode. Let represent the set of the amplitudes recorded at time in the trials in bin .
Let us consider two bins, and . If is an instantiation of and is an instantiation of , one could check whether the signal amplitude distributions recorded in bins and at a particular time step are statistically different by applying the KolmogorovSmirnov test for distributions to the datasets and . The test would return a value, which we will call . The smaller , the better the statistical separation between the signal amplitude distributions in bins and at time step . Naturally to get an indication of how statistically different the ERPs in different bins are one would then need to somehow integrate the values obtained at different 's and for different pairs of bins.
Since we are interested in obtaining bins (via the optimisation of their membership functions ) which contain maximally mutually statistically different trials, we require that the sum of the values returned by the KolmogorovSmirnov test when comparing the signal amplitudes in each pair of bins over the time steps in an epoch be as small as possible. So, we want to maximise the following fitness functionFitness, KolmogorovSmirnov test:
We did our experiments using a linear registerbased GP system. The system uses a steadystate update schedule.
The Primitive setprimitive set used in our experiments is shown in Table .1. The instructions refer to four registers: the input register ri which is loaded with the response time, , of a trial before a program is evaluated, the two generalpurpose registers r0 and r1 that can be used for numerical calculations, and the register rs which can be used as a swap area. r0, r1 and rs are initialised to 0. The output of the program is read from r0 at the end of its execution. In the addition and multiplication instructions we used the MemorywithMemorymemorywithmemory technique proposed in [20] with a memory coefficient of 0.5. So, for example the instruction r0 < r0 + ri is actually implemented as r0 = 0.5 * r0 + 0.5 * ( r0 + ri ) while r1 < r0 * r1 is implemented as r1 = 0.5 * r1 + 0.5 * ( r0 * r1 ).
NOP  r0 < 1  r1 < r0 + r1 
r0 < 0  r1 < 1  r0 < r0 * r1 
r1 < 0  r0 < r0  r1 < r0 * r1 
r0 < 0.5  r1 < r1  r0 < r0 * r0 
r1 < 0.5  r0 < r0 + ri  r1 < r1 * r1 
r0 < 0.1  r1 < r1 + ri  rs <> r0 
r1 < 0.1  r0 < r0 + r1  rs <> r1 
As in [22], in our tests we consider three bins. So, we need to evolve three membership functions, which we will call , and . To help GP in this difficult task we constrained the family of functions from which the membership functions could be drawn. So, instead of evolving the three functions , and , we decomposed each function into three components and we asked GP to evolve the components used in the formulation of each . So, each GP individual was actually made up of nine programs. All nine must be run to decide with which probability an element of an ERP dataset should belong to each responsetime bin.
More specifically, our Membership functionsmembership functions had the following form:
The system initialised the population as follows. All nine programs in an individual had identical length (50 instructions). The length was fixed, but through the use of NOP instructions, the active code was effectively of variable size. The nine programs were concatenated, so effectively an individual was an array of 450 instructions. Programs were initially all made up only of NOP instructions, but they were immediately mutated with point mutation with a mutation rate of 8% so that on average approximately 4 instructions in each of the 9 programs were nonNOP. When an instruction was mutated, the instruction was replaced with a random instruction from the whole primitive set. These choices of parameters were based on some preliminary tests.
The system used tournament selection with tournament size 10. At each iteration, the system randomly decided whether to perform reevaluation of the fitness of an individual (keep in mind that our fitness function is noisy) or to create a new individual. It reevaluated fitness with probability 0.1 and performed crossover with a probability of 0.9. When fitness reevaluation was chosen, the new fitness value was blended with the old one using the formula: . This effectively lowpass filters the fitness values using a simple IIR filter, thereby eventually leading to fitness values to stabilise around the expected value for each program. When crossover was performed, two parent individuals were selected, and 9point crossover was performed. The 9 points were not constrained to fall within the 9 programs that form an individual. Crossover returned one offspring after each application. The offspring was mutated using point mutation with a mutation rate of 4% (so, on average each program was hit by two mutations) and then was evaluated. The offspring was then inserted in the population, replacing an individual which was selected using a negative tournament (with tournament size 10). Given the heavy computational nature of the task we used populations of size 1,000 and 5,000 and we performed 50 generations in each run. To see what kind of results could be obtained with smaller runs, we also performed runs with a population size of 50 run for 20 generations (for a total of 1,000 fitness evaluations).
The data used for our experiments were obtained as follows. We modified an experiment originally designed by [11]. In the experiment a composite stimulus is presented at a randomly chosen location (out of four possible locations) on a display for a very short time (between 50 and 150ms depending on conditions). The task of the subject is to identify whether the stimulus represented a target or a nontarget stimulus. To correctly perform the task participants needed to identify and conjoin multiple features of the stimulus and then they needed to click a button to signal their response. While the participant performed the task they were connected to electroencephalographic equipment so that the waves generated during the task in different areas of the brain could be recorded. We used a BioSemi ActiveTwo system with 64 preamplified electrodes plus additional electrodes on the earlobes, the external canthi and infraorbital positions. Signals were acquired at 2048 samples per second, were then bandpassfiltered between 0.15 and 40 Hz and, finally, were downsampled to 512 samples per second. We tested six students from the University of Essex, all with normal or correctedtonormal vision. Each experiment lasted about one hour, and took about one further hour for preparation and for practice.
Trials were classified according to whether the target was present or absent and according to whether the response was `Correct' or `Incorrect'. This resulted in four conditions: true positives (target present, correct response), true negatives (target absent, correct response), false positives (target absent, incorrect response) and false negatives (target present, incorrect response). For the tests reported in this paper we focused on the largest class, the True Negatives, which included a total of 2967 trials. We used epochs of approximately 1200ms (614 samples). I.e., each trial contained a vector of 614 signal amplitude samples for each electrode. Each trial had an associated response/reaction time which represents the time lapsed between the presentation of the stimulus and the response provided by the user in the form of a mouse click. Following [22], the 10% of the trials with the longest response times were discarded. This left 2670 trials. In order to evaluate the fitness of an individual in the population, we needed to run the nine programs included in the individual on each of the trials in the dataset, i.e., the GP interpreter was invoked over 24,000 times before the fitness function could start executing.
With the fitness function defined in Section 4, the objective of evolution is to identify three membership functions which allow one to divide up this dataset into bins based on response times in such a way as to maximise the mutual statistical significance of differences in the bins' amplitude averages. Note that evolution can choose to evolve functions that discard certain ranges of response times if this is advantageous.
With 3 bins (i.e., 3 binvsbin comparisons), 64 electrodes and 614 samples per epoch evaluating our fitness function would require running 117,888 KolmogorovSmirnov tests per fitness evaluation. Since such tests are rather computationally expensive, we decided to scale down the problem by concentrating on one particular electrode (`Pz') and by further subsampling the amplitude data by a factor of 16. So, after performing the binning of the dataset, we needed to run the KolmogorovSmirnov test times per fitness evaluation.
We show the responsetime distribution recorded in our experiments for the True Negatives in Figure .2 (note that amplitudes have been normalised so that the curves are density functions; abscissas are in seconds). The boundaries of the 30%quantile fixedsize bins produced with the method described in [22] are shown as vertical lines in Figure .2. The medians and standard deviations, estimated using MAD, for the whole distribution and for the bins are also shown in Figure .2. As indicated above, the objective of GP is to probabilistically divide up this distribution into bins using appropriate membership functions in such a way to maximise the statistical significance of bin averages.
The fitness value for the standard membership functions (rectangular bins) is approximately 0.8297, which corresponds to a mean KolmogorovSmirnov value of 0.1703. This implies that only for a fraction of the time steps in an epoch differences between bin averages are statistically significant at the standard confidence levels of 0.10 and 0.05. We want GP to improve on this.

We performed 50 runs with populations of size 50 and 1,000, and 10 runs with populations of size 5,000 on 182core Linux cluster with Xeon CPUs. We report the mean, standard deviation, min and max of best fitnesses as well as the quartiles of the fitness distribution recorded in out experiments in Table .2. As one can see in all conditions, the method is very reliable, all standard deviations being very small. Even with the smallest population GP improved over the standard binning technique in all runs. This is particularly remarkable given that such runs required only approximately 2 minutes of CPU time each. Naturally, only runs with 1,000 and 5,000 individuals consistently achieved best fitnesses close to or exceeding 0.9, which corresponds to average values of 0.1 or less. This is a very significant improvement over the value associated with rectangular bins. Now, for a large proportion of the time in an epoch differences between bin averages are statistically significant. CPU time was approximately 4 hours for runs of 1,000 individuals and approximately one day for runs of 5,000 individuals. Note that these long training times are not a problem in the domain of ERP analysis, since setting up an experiment, trialling it, then collecting the data with independent subjects, preparing the data for averaging and finally interpreting them after averaging require many weeks of work.
Population size 50, 20 generations  
Statistic  Best  Qrtl 1  Qrtl 2  Qrlt 3  Qrtl 4 
Mean  0.87750  0.86354  0.86020  0.85613  0.17514 
StdDev  0.008868  0.006952  0.007123  0.008249  0.272409 
Max  0.900335  0.877868  0.876486  0.872651  0.753881 
Min  0.855929  0.845703  0.842577  0.837546  0.000000 
Population size 1,000, 50 generations  
Statistic  Best  Qrtl 1  Qrtl 2  Qrlt 3  Qrtl 4 
Mean  0.89862  0.88161  0.88056  0.87910  0.00000 
StdDev  0.00396  0.00293  0.00288  0.00307  0.00000 
Max  0.91348  0.89096  0.88979  0.88922  0.00000 
Min  0.89197  0.87720  0.87526  0.87346  0.00000 
Population size 5,000, 50 generations  
Statistic  Best  Qrtl 1  Qrtl 2  Qrlt 3  Qrtl 4 
Mean  0.90431  0.88301  0.88214  0.88091  0.00000 
StdDev  0.0060682  0.0039270  0.0038899  0.0040199  0.00000 
Max  0.91763  0.89148  0.89053  0.88947  0.00000 
Min  0.89914  0.88039  0.87956  0.87794  0.00000 
In order to achieve this high level of performance and reliability in the ERP binning problem, GP has discovered how to partition the data based on response times in such a way as to optimally balance two needs: (a) the need to include as many trials as possible in each bin so as to reduce noise in both variablelatency and fixedlatency ERP components, and (b) the need to make the bins as narrow as possible so as to reduce the systematic errors associated with averaging variablelatency components.
As an example, we plot the best evolved bin membership functions in the 50 runs
with a population of 1,000 individuals
in Figure .3.
These correspond to the following equations:

The ERP averages produced by this solution are shown in Figure .4. For reference we show the averages obtained with traditional rectangular bins in Figure .5. As one can see the ERP averages for the middle bins are almost identical to the full average in both cases. This is because both the reference bin and the GPevolved bin capture the median response time and surrounding samples, which are representative of the central tendency of the whole distribution. However, when comparing the ERP averages for bins 1 and 3 with the corresponding reference averages, we see that the membership functions evolved by GP are more selective in their choice of trials. This produces bigger (and hence more statistically significant) variations between groups. Particularly interesting is the case of bin 3, which, with the standard binning method, is adjacent to bin 2 and is very broad. This led to averaging ERP components having an excessively wide distribution of latencies, leading to an ERP average where late endogenous components, which are typically associated with the preparation of the response, are hardly discernible. Instead, GP has produced a much narrower bin 3 and a large gap between bins 2 and 3. As one can see from Figure .4, this yields a much clearer representation of such late potentials.
In this paper we used a multiprogram form of registerbased GP to discover probabilistic membership functions for the purpose of binning and then averaging ERP trials based on response times. The objective was to evolve membership functions which could significantly improve the mutual statistical significance of bin averages thereby capturing more accurately true brain waves than when using simple rectangular bins.
Our results are very encouraging. GP can consistently evolve membership functions that almost double the statistical significance of the ERP bin averages with respect to the standard binning method.
In future work we will test the generality of evolved solution, by applying the bins found by GP to newly acquired (unseen) data. We also intend to make use of our new bin averaging technique in BCIsBrainComputer Interface. Indeed, the work presented in this paper originated from the need to understand in exactly what ways stimulus features and task complexity, as well as cognitive errors, modulate ERP components in BCI [6,8,7]. Our long term objective is to formally link quantitative psychological models of feature binding and perceptual errors [15,5,4] with the presence of specific ERP components and the modulation of their latency and amplitude. This knowledge could then be used to design a new generation of BCIs where the behaviour and features of human cognitive systems are best exploited.
We would like to thank the Engineering and Physical Sciences Research Council (grant EP/F033818/1) and by the Experimental Psychological Society (UK) (grant ``Binding Across the Senses'') for financial support and Francisco Sepulveda for helpful comments and suggestions.
This document was generated using the LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html split 1 gptp2009.tex
The translation was initiated by Riccardo Poli on 20110801
Riccardo Poli 20110801