Abstract:

The P300 is an endogenous Event-Related Potential (ERP) which is naturally elicited by rare and significant external stimuli. P300s are used increasingly frequently in Brain Computer Interfaces (BCI) because users of ERP-based BCIs need no special training. However, P300 waves are hard to detect and, therefore, multiple target stimulus presentations are needed before an interface can make a reliable decision. While significant improvements have been made in the detection of P300s, no particular attention has been paid to the variability in shape and timing of P300 waves in BCIs. In this paper we start filling this gap by documenting, modelling and exploiting a modulation in the amplitude of P300s related to the number of non-targets preceding a target in a Donchin speller. The basic idea in our approach is to use an appropriately weighted average of the responses produced by a classifier during multiple stimulus presentations, instead of the traditional plain average. This makes it possible to weigh more heavily events that are likely to be more informative, thereby increasing the accuracy of classification. The optimal weights are determined through a mathematical model that precisely estimates the accuracy of our speller as well as the expected performance improvement w.r.t. the traditional approach. Tests with two independent datasets show that our approach provides a marked statistically significant improvement in accuracy over the top performing algorithm presented in the literature to date. The method and the theoretical models we propose are general and can easily be used in other P300-based BCIs with minimal changes.

Documenting, modelling and exploiting P300 amplitude changes due to variable target delays in Donchin's speller

Luca Citi, Riccardo Poli, and Caterina Cinel
Brain-Computer Interfaces Lab,
School of Computer Science and Electronic Engineering,



Introduction

Brain-Computer Interfaces (BCIs) measure specific (intentionally and unintentionally induced) brain activity signals and translate them into device control signals [1,2].

Many factors limit the performance of BCIs. These include: the natural variability and noise in the brain signals measured; the limitations of recording devices, of the processing methods that extract signal features and of the algorithms that translate these features into device commands; the quality of the feedback provided to the user; the lack of motivation, the tiredness, the limited degree of understanding, the age variations, the handedness, etc. of users; the combination of mental tasks used; and the natural limitations of the human perceptual system [3].

While amplitude and shape variations in brain waves are often a hindrance for BCIs, in some cases they carry information which can be exploited to improve performance, irrespective of whether a physiological explanation for such variations is available [3,4]. Some of these variations are hard to document because standard averages may severely misrepresent what really goes on in the brain on a trial by trial basis [5,6,7,8], although more sensitive averaging techniques have recently become available [9,10].

In this paper, we will document, theoretically model and exploit a source of waveform variations in P300 waves, namely, their modulation caused by variations in the interval between target-stimulus presentations. We illustrate our ideas using the Donchin speller, although the benefits of the approach can be accrued also in other P300-based systems.

The paper is organised as follows. In section 2 we provide some background on P300 waves and on the factors that may affect their characteristics, with particular attention paid to effects associated with the timing of stimulus presentation. In the same section we also introduce the Donchin speller paradigm. In section 3, we document the effects of the number of non-targets preceding a target on the amplitude and shape of P300s. In section 4, we consider a typical two-stage implementation of Donchin's speller, where one stage processes the responses to single stimuli while the other integrates such responses and decides which character to enter. We then derive mathematical models of the two stages, which, together, express the speller's accuracy, and generalise such models to make them applicable to a variety of ERP-based BCIs. Section 5 extends the model of the first stage of Donchin speller so as to consider the modulations of P300s associated with stimulus history documented in section 3. In section 6, we propose and model a new algorithm for the second stage of the speller that exploits such effect making it possible to compare its performance to that of the standard speller. The analysis (corroborated by Monte Carlo simulations) indicates that our algorithm is superior to the top performing Donchin speller from the literature. In section 7 we verify the predictions of our mathematical models by using two independent datasets: one from BCI Competition III and 12 subjects tested in our labs. Results show that the new system is statistically significantly superior to the original. We discuss and position our results in section 8. Finally, we present some conclusions and indicate promising avenues for future work in section 9.


Background


P300 waves and factors affecting their characteristics

Among the different approaches used in BCI studies, those based on event related potentials (ERPs) [11,12,13] are rather appealing since, at least in principle, they offer a relatively high bit-rate and require no user training. ERPs are relatively well defined shape-wise variations of the ongoing EEG which are elicited by a stimulus and are temporally linked to it. ERPs include an exogenous response, due to the primary processing of the stimulus, as well as an endogenous response, which is a reflection of higher cognitive processing induced by the stimulus [14].

The P300 wave is an endogenous ERP with a latency range of approximately 300 to 600 ms and which is elicited by rare and/or significant stimuli (visual, auditory, or somatosensory). Effectively, the presence of P300 ERPs depends on whether or not a user attends a rare, deviant or target stimulus. This is what makes it possible to use them, in BCI systems, to determine user intent. For an overview of the cognitive theory and the neurophysiological origin of the P300 see [15].

The characteristics of the P300, particularly its amplitude and latency, depend on several factors [16]. Some factors, such as food intake, fatigue or use of drugs, are related to the psychophysical state of the subject [17]. Other factors, such as the number of stimuli, their size and their relative position, are related to the spacial layout of the stimuli [18,19,20]. Finally, there are several important factors related to the timing and sequence of stimuli. Since these are the focus of this paper, we will devote the rest of this section to summarise what is known about such factors.

Several studies have reported that P300 amplitude increases as target probability decreases (see [14] for a review). The P300 amplitude seems to also be positively correlated with the inter-stimulus interval (ISI) or the stimulus onset asynchrony (SOA) (e.g., see [21,22,23]). Other studies [24,25] note that, despite the P300 being clearly affected by target-stimulus probability and ISI, each of these factors also varies the average target-to-target interval (TTI) and hypothesise that TTI is the true factor underlying the P300 amplitude effects attributed to target probability, sequence length, and ISI. The P300 is also sensitive to the order of non-target and target stimuli, most likely because this temporarily modifies target-stimulus probabilities and TTIs. Indeed, there is a positive correlation between P300 amplitude and the number of non-target stimuli preceding a target (e.g., [26,27]).

In order to avoid the overlap between two consecutive epochs, in psychophysiology the effects discussed above have always been studied with SOAs of approximately one second or more. However, such long SOAs would drastically limit the information transfer rate in BCIs, where SOAs of less than 300 ms are much more common. Also in these cases there is a reduction of the P300 amplitude when two targets are close in a sequence of stimuli. However, this is more likely the result of phenomena like repetition blindness or attentional blink [3] than the TTI effect documented with slower protocols.

The influence of the timing and sequence of stimuli preceding a target on the P300 could be partially explained as the result of ``recovery cycle'' limitations in the mechanisms responsible for the generation of this ERP [24]. The smaller potentials produced when a target is presented shortly after another target might simply be the result of the brain not having yet reacquired the necessary resources to produce large ERPs.

Some ascribe such effects to an inability to consistently generate P300s in the presence of short TTIs [28], suggesting that a smaller average amplitude might be the result of an increase in the fraction of responses to target stimuli that do not present a P300 rather than a modulation of the P300 amplitude in all responses.

In summary, the literature indicates that the order and timing of stimuli have measurable effects of P300s. In an effort to capture and exploit some such effects to improve BCI performance, in this paper we will focus on the number of non-targets preceding a target stimulus. Note that we make no claims regarding whether or not such a number is the most primitive factor affecting P300 amplitudes. All we need for our method to improve performance is that P300's amplitude be correlated with this number, which is the case for Donchin's speller, as we will show in Section 3.


Donchin ERP-based speller

Donchin and Farwell [11] designed a speller based on the P300 ERP, which is now in widespread use. In this speller, users are presented with a 6 by 6 matrix of characters (see figure 1), the rows and columns of which are highlighted for a short period and one at a time. The sequence of flashes is generated randomly without replacement, i.e., a row or column is not allowed to flash again until all other rows and columns have flashed. The users' task is to focus their attention on the character they want to input. Thus, the flashes of the row and column containing the desired character represent target stimuli. As they are relatively rare, they elicit a P300 wave. In principle, a BCI could then identify the row and column of the target character, thereby inferring it, by looking for the largest P300-like responses in the EEG signals acquired immediately after the flashing of all the rows and columns, respectively. Unfortunately, noise makes reliably recognising P300s in single trials very difficult. However, by asking users to focus their attention on the character of interest for multiple blocks of 12 flashes, and averaging either the signal epochs related to each row and column or the corresponding outputs of a P300 classifier, it is possible to infer the target character with acceptable accuracy [29].

Figure 1: The matrix of characters used in the Donchin speller.
\includegraphics[width=.25\textwidth]{Donchin}


Stimulus history effects in the Donchin speller

Evidence of the shape of P300s being modulated by the temporal distance between consecutive targets in the Donchin speller has recently been reported [30,31,32]. In this work, we will take this one step further by modelling and exploiting this phenomenon in order to improve the performance of the speller. Before we do so, however, in this section we want to present experimental evidence further corroborating the results of [30,31,32]. Related experimental data will be used in Section 5.1 to corroborate our model.

We studied the training set of the two subjects of dataset II from the BCI competition III [33] (results with 12 further subjects will be presented in Section 7.2). Data were collected with the Donchin speller protocol described above, using a SOA of 175 ms. For each subject, the training set consisted of 85 blocks. Each block corresponds to a subject focusing on one character for 15 repetitions of the sequence of the flashes of the 6 rows and 6 columns in the display. Further details on the data can be found in [33].

We bandpass-filtered the signals in the band 0.15 - 5 Hz (HPF: 1600-tap FIR; LPF: 960-tap FIR) to reduce exogenous components at the stimulus presentation frequency (5.7 Hz) and higher. One-second epochs following each flash were extracted. Therefore, for each subject, a total of 15300 (85$\times$15$\times$12) epochs were available, of which 2550 (85$\times$15$\times$2) corresponded to targets.

The set of epochs was partitioned according to the number, $h$ (for ``history''), of non-targets presented between the previous target and the stimulus corresponding to the current epoch. For example, for the sequence ...TNNNT..., the second target (T) is assigned to partition t3 because it is preceded by three non-targets (N).

Outlier, e.g., epochs with unusually large deflections due to movements or eye-blinks, were removed using the following procedure. For each set of epochs, the first quartile, $\mathrm{q}_1(i)$, and third quartile, $\mathrm{q}_3(i)$, at each sample, $i$, were found. Then an acceptance ``strip'' was defined as the time-varying interval $[\mathrm{q}_1(i) - 1.5 \Delta \mathrm{q}(i),\mathrm{q}_3(i) + 1.5 \Delta \mathrm{q}(i)]$ where $\Delta(i)=\mathrm{q}_3(i)-\mathrm{q}_1(i)$ is the interquartile range. Responses falling outside the acceptance strip for more than one tenth of the epoch were rejected. For each partition, the mean, $\mathrm{m}(i)$, and the standard error, $\mathrm{ste}(i)$, of the responses for each class were finally evaluated using the remaining epochs.

Figure 2 shows the average responses obtained from the epoch-partitioning and artefact-rejection procedures described above. The results suggest that despite Donchin's speller being characterised by a fast SOA, significant modulations of the P300 amplitude related to the number, $h$, of non-targets preceding a stimulus are present. The most significant effect is visible around the peak of the ``t'' line, that represents the average P300. Indeed, if we focus on around 450 ms for subject A and 350 ms for B, we see that the P300 is hardly visible in the average associated with the ``t0'' partition which represents two targets in a row. The same is true for the average of partition ``t1''. The averages for other partitions show a progressive increase in the P300 amplitude as the number of non-targets separating a target from the previous target grows.

The effect is particularly clear in figure 3 where we plotted P300 amplitudes vs. $h$ for subjects A and B at 450 ms and 350 ms, respectively. Note that, because of the randomisation within each block of 12 stimulus presentations, it is possible, albeit infrequent, for a target stimulus in a block to be followed by up to 20 non-target stimuli before a target stimulus in the following block is presented.

Figure 2: Modulation of the amplitude and shape of P300s as the number of non-targets preceding the current stimulus, $h$, varies for the two subjects in the BCI competition III Donchin's speller dataset. The thin lines are averages of ERPs recorded in channel Cz for the epochs partitioned according to $h$ (lines ``t0'' to ``t9+''). The thick line labelled ``t'' is the average of all targets; a vertical dashed line marks the time where this reaches its peak in each plot. The line ``nt'' is the average of an equally-sized random sample of non-targets. In all cases the number of epochs averaged (after artefact rejection) is shown in brackets.
\includegraphics[width=9cm]{Effects_Subject_A_Train_Cz_bw_simp}
Subject A



\includegraphics[width=9cm]{Effects_Subject_B_Train_Cz_bw_simp}
Subject B

Figure 3: A cross section of the data in figure 2 taken at the time when the average target ERP (line ``t'' in figure 2) peaks. Whiskers represent standard errors. Note that the groups of partitions ``t2-t4'', ``t5-t8'' and ``t9+'' from figure 2 have been split more finely for a more detailed analysis.
\includegraphics[width=9cm]{Effects_Subject_AB_Train_Cz_maxP300_bw}


Modelling the accuracy of a Donchin speller

An objective of this paper is to model and exploit the P300 modulations documented in the previous section to improve the accuracy of Donchin's speller. As we will see in Section 6, simple modifications to the internal structure of the speller can achieve this. However, before we can proceed with this, we need to describe and model the internal operations of typical Donchin spellers. We do this in this section.


Algorithm of a Donchin speller

In a Donchin speller, inferring the character the user wants to input is a two-stage process (figure 4). The ERPs produced in the 1-second epochs following the flashing of the rows and columns of characters in the display are acquired across all EEG channels and time steps, resulting in a sequence of 2-D arrays (far left of the figure). In the first stage, each such array is processed by a feature extractor/classifier whose output is expected to contain useful information as to whether the corresponding flash represented a target or a non-target (flash score). In the second stage, an algorithm combines the flash scores for different epochs, assigns a score to each character (character score) and chooses the character with the highest score as its classification for the data. Below we provide details on how the two stages are implemented in current ERP-based BCIs.

Figure 4: Block diagram of typical implementations of Donchin's speller (see text for details).
\includegraphics[width=\textwidth]{Diagram_P300_BCI}

As shown in figure 4(left), the first stage often includes a feature extractor/selector that creates a low-pass filtered and sub-sampled version of the epoch following a flash and selects a subset of informative samples and channels. Let us call $x_i$ the resulting vector of features characterising the $i$-th epoch. Feature vectors are typically processed by a classifier that performs a rotation and a projection of the feature space in the direction that maximises some form of separability measure between the target and non-target classes. Frequently the classifier is used in an unconventional way: it is trained as if it were a binary classifier with a two-class output, but it is then used without applying any thresholding as a continuous scoring function, $f$. Let $y_i = f(x_i)$ be the flash score produced by such a function in response to feature vector $x_i$.

As shown in figure 4(right), in its simplest form the second stage can be performed in two steps. The first involves averaging the scores associated with the flashing of each row and column independently and then computing a score for each character by adding up the averages of the corresponding row and column. The second step simply requires choosing the character with the maximum character score [34].

In formulae, in the standard approach, the character at the intersection of row $r$ and column $c$ is given a score

$\displaystyle \bar S_{r,c} = \frac{1}{J} \sum_{\rho \in \mathcal{R}_r} \; y_\rho
 + \frac{1}{J} \sum_{\chi \in \mathcal{C}_c} \; y_\chi \;,$ (1)

where $ J$ is the number of repetitions of the sequence of stimuli, $ y_\rho$ and $ y_\chi$ are the outputs of the flash scoring function $f$ in the presence of feature vectors $ x_\rho$ and $ x_\chi$, respectively, while $ \mathcal{R}_r$ and $ \mathcal{C}_c$ are sets of indices of the stimuli where row $r$ and column $c$ flashed, respectively. The classifier then chooses the character at position

$\displaystyle (\hat r, \hat c) = \arg\max_{(r,c)} \bar S_{r, c} \; ,$ (2)

in the character grid.


Model of the accuracy of a Donchin speller

From the description provided in the previous section, it is clear that the accuracy of Donchin's speller depends the performance of the two main stages shown in figure 4. In this section we will construct a probabilistic model that will allow us to compute the overall performance of the speller. We will do so by modelling its second stage in detail, while we will coarse-grain on the internal mechanisms of the first by only modelling its output, i.e., the flash scores. We do this because later in the paper we will modify the second stage so as to exploit target history effects and this will require only knowing and adapting the details of the model of the second stage of the standard speller.

Before we start, we need to model the sequence of stimuli, since the output of a speller and its performance are necessarily functions of the visual stimuli as well as, naturally, of the user's intentions. We model the sequence of $ J \times 12 $ flashes presented to the subject as stochastic vector $ L = (L_1, L_2, L_3, ... )$. Each component $ L_i$ of the vector is itself a stochastic variable drawn from the set of outcomes $ \{\mathsf{R}_1,\dots,\mathsf{R}_6,\mathsf{C}_1,\dots,\mathsf{C}_6\}$ where $ \mathsf{R}_r$ represents the event ``the $r$-th row flashed'', while $ \mathsf{C}_c$ represents the event ``the $c$-th column flashed''. Let us denote with $ \ell$ an instantiation of $ L$ and with $ \ell_i$ the instantiation of the element/variable $ L_i$.

The sequence of stimuli needs to be treated as a stochastic variable rather than a constant because of the randomness in the sequence generation process described in section 2.2. Thus, in different repetitions of the experiment different sequences of stimuli will likely be presented to the user. This will happen, for example, if the user needs to enter different letters or the same letter multiple times.


Modelling Stage 1

ERPs are modulated by the nature and type of the stimuli eliciting them. However, even in the presence of identical external stimuli, ERPs are typically affected by a great deal of noise and variability. As a result the feature vectors, $x_i$, and the flash scores, $ y_i$, produced in the first stage of Donchin's speller should be treated as variable, or, more specifically, as stochastic functions of the variables $ L_i$ introduced above.

In principle to fully model the first stage of the speller which corresponds to the process of assigning a score to each flash, one would need to identify the joint probability distribution of the variables $ y_1$, $ y_2$, .... as a function of $ L$. However, this is very hard to do. Therefore, here we will introduce some simplifying assumptions, which, as we will show later, still produce accurate models. Firstly, we will assume that all the scores $ y_i$ that are associated with the flashing of a row or column containing the target character are independent and identically distributed, and that the same is true for the scores produced in response of non-target flashes. Secondly, if we denote by $ Y_{\mathrm{nt}}$ any stochastic variable that represents the flash score in the presence of a non-target and by $ Y_{\mathrm{t}}$ any variable representing the flash score in the presence of a target, we further assume that such variables have different means, which we will denote with $ \alpha_{\mathrm{nt}}$ and $ \alpha_{\mathrm{t}}$, respectively. Thirdly, we assume that, other than for the mean, such variables have the same distribution, i.e., they can be represented as

$\displaystyle Y_{\mathrm{nt}} = \alpha_{\mathrm{nt}} + \mathring Y$ (3)
$\displaystyle Y_{\mathrm{t}} = \alpha_{\mathrm{t}} + \mathring Y$ (4)

for some assignment of the parameters $ \alpha_{\mathrm{t}}$ and $ \alpha_{\mathrm{nt}}$, where $ \mathring Y$ is a zero-mean stochastic variable. Finally, we assume that the variable $ \mathring Y$ is normally distributed with zero mean, i.e., $ \mathring Y \sim
N(0,\sigma^2)$ for some assignment of $ \sigma$, and we will interpret such variable as the noise in the flash scores.

Naturally, exactly which variables $ y_i$ in a sequence are of type $ Y_{\mathrm{t}}$ and which are of type $ Y_{\mathrm{nt}}$ depends on whether the $i$-th flash is that of a row/column containing the target character or not, respectively.


Modelling Stage 2

Based on the observations above, it is clear that the character scores, $ \bar S_{r,c}$, computed in the second stage of the speller via (1), are averages of stochastic variables (the flash scores, $ y_i$) and, thus, are stochastic themselves. Randomness in the $ \bar S_{r,c}$ is not only due to the $ y_i$'s but also to the fact that the sets of indices $ \mathcal{R}_r$ and $ \mathcal{C}_c$ in (1) are themselves stochastic. In fact, as one can immediately see, if we formalise their definition, i.e., $ \mathcal{R}_r = \{k \vert L_k =\mathsf{R}_r\}$ and $ \mathcal{C}_c =
\{k \vert L_k=\mathsf{C}_c\}$, they are functions of the stimulus sequence $ L$.

An important consequence of the 36 character scores $ \bar S_{r,c}$ being stochastic variables is that one cannot say which character has the maximum value: in principle in different instantiations of the 36 scores $ \bar S_{r,c}$ different characters could present the maximum score. If the target character is at position $ (\hat r, \hat c)$, the returned character will be correct if and only if

$\displaystyle \bar S_{\hat r, \hat c} = \max_{r,c} \bar S_{r, c} \; ,$ (5)

ignoring the zero-probability event of a tie. The objective of this section, computing the accuracy of the speller, which we will denote with $ A$, can, therefore, be recast as determining the probability of (5) being the case. In formulae:

$\displaystyle A = \mathrm{Pr}\left(\bar S_{\hat r, \hat c} = \max_{r,c} \bar S_{r, c}\right) .$ (6)

Let us start by rewriting (1) in terms of stochastic variables:

$\displaystyle \bar S_{r,c} = \frac{1}{J} \sum_{\rho \in \mathcal{R}_r} \; Y_\rho
 + \frac{1}{J} \sum_{\chi \in \mathcal{C}_c} \; Y_\chi \;,$ (7)

where $ Y_\rho$ and $ Y_\chi$ now are stochastic score variables. This, in turn, can be rewritten as

$\displaystyle \bar S_{r,c} = \bar U_{r}
 + \bar V_{c}$ (8)

where $ \bar U_{r}$ is the average of the $ J$ variables $ Y_\rho$ for $ \rho \in \mathcal{R}_r$, while $ \bar V_{c}$ is the average of the $ J$ variables $ Y_\chi$ for $ \chi \in \mathcal{C}_c$. Note that both term in (8) are stochastic variables.

The random variable $ \bar U_{r}$ has a distribution that depends on whether or not $ r=\hat r$. Specifically, if $ r=\hat r$, the variables $ Y_\rho$ in (7) are all distributed like $ Y_{\mathrm{t}}$. Therefore, $ \bar U_{\hat r}$ is the average of $ J$ independent instances of $ Y_{\mathrm{t}}$ and, so, it is a normal variable $ \bar U_{\hat r} \sim N(\mu_{\bar U_{\hat r}},\sigma_{\bar
U_{\hat r}}^2)$ with $ \mu_{\bar U_{\hat r}} = \alpha_{\mathrm{t}}$ and $ \sigma_{\bar U_{\hat r}}^2 = \sigma^2 / J$. Similarly, if $ r
\neq \hat r$, the variables $ Y_\rho$ in (7) are all distributed like $ Y_{\mathrm{nt}}$. Thus, if we denote with $ \ast$ any non target row, $ \bar U_{\ast}$ is a normal variable $ \bar U_{\ast}
\sim N(\mu_{\bar U_{\ast}},\sigma_{\bar U_{\ast}}^2)$ with $ \mu_{\bar
U_{\ast}} = \alpha_{\mathrm{nt}}$ and $ \sigma_{\bar U_{\ast}}^2 =
\sigma^2 / J$.

Using similar arguments, it is easy to show that the variables $ \bar V_{\hat c}$ and $ \bar V_{\ast}$ are distributed like $ \bar U_{\hat r}$ and $ \bar U_{\ast}$, respectively.

Because of the independence of the variables $ Y_i$ and the geometry of the speller's grid, for any $ r_1
\ne r_2$ the set of $ Y_i$ averaged in $ \bar U_{r_1}$ and the set of the $ Y_i$ averaged in $ \bar U_{r_2}$ are disjoint. The same is true of the $ \bar V_c$ variables. In other words, the 6 variables $ \bar U_{r}$ and the 6 variables $ \bar V_{c}$ are all mutually independent. Therefore,

$\displaystyle \hspace{-1cm}
 \bar S_{\hat r, \hat c} = \max_{r,c} \bar S_{r, c}...
...quad
 \bar V_{\hat c} > \bar V_{\mathrm{max}} = \max_{c \neq \hat c} \bar V_{c}$ (9)

and, from (6),

$\displaystyle A = \mathrm{Pr}\left(\bar S_{\hat r, \hat c} = \max_{r,c} \bar S_...
...}}\right) \:
 \mathrm{Pr}\left(\bar V_{\hat c} > \bar V_{\mathrm{max}}\right) .$ (10)

Hence, to make further progress in our modelling effort, we need to compute $ \mathrm{Pr}\left(\bar U_{\hat r} > \bar U_{\mathrm{max}}\right)$ and $ \mathrm{Pr}\left(\bar V_{\hat c} > \bar V_{\mathrm{max}}\right)$.

Let us start with $ \mathrm{Pr}\left(\bar U_{\hat r} > \bar U_{\mathrm{max}}\right)$. Having already computed the distribution for $ \bar U_{\hat r}$, first, we need to compute the distribution of $ \bar U_{\mathrm{max}}$. To do so, we exploit the fact that any number greater than the maximum of multiple variables is greater than any individual variable. Therefore, for independent variables, the cumulative distribution function of their maximum is the product of the cumulative distribution functions of the original distributions. That is

$\displaystyle \mathrm{F}_{\bar U_{\mathrm{max}}}(x) = \mathrm{Pr}\left(\bar U_{...
...{Pr}\left(\bar U_r < x\right) = \left[\mathrm{F}_{\bar U_\ast}(x)\right]^5 \; ,$    

where the symbol $ \mathrm{F}_{.}(x)$ represents cumulative distribution functions.

We are now in a position to compute $ \mathrm{Pr}\left(\bar U_{\hat r} > \bar U_{\mathrm{max}}\right)$. In general, given two independent random variables $ X$ and $ Y$, the probability of $ X > Y$ is

\begin{displaymath}\begin{split}
 \mathrm{Pr}\left(X > Y\right) &= \int_{\mathbb...
...}_{X}(x) \: \mathrm{F}_{Y}(x) \;\mathrm{d}{x} \; .
 \end{split}\end{displaymath}    

Thus,

\begin{displaymath}\begin{split}
 &\mathrm{Pr}\left(\bar U_{\hat r} > \bar U_{\m...
... U_\ast}}\right)
 \right]^5 \! \;\mathrm{d}{x}\; ,
 \end{split}\end{displaymath} (11)

where

$\displaystyle \mathrm{normpdf}\left(x\right) =$ $\displaystyle \frac{1}{\sqrt{2\pi}} \exp \left(-\frac{x^2}{2} \right)$    
$\displaystyle \mathrm{normcdf}\left(x\right) =$ $\displaystyle \frac{1}{2} \left(1+\mathrm{erf}\left( \frac{x}{\sqrt{2}} \right) \right) \; .$    

Using the same approach one can easily compute $ \mathrm{Pr}\left(\bar V_{\hat c} > \bar V_{\mathrm{max}}\right)$. By substituting that result and (11) into (10), we finally obtain the following expression for the accuracy of Donchin's speller:

$\displaystyle A = \left\{
 \int_{\mathbb{R}}\frac{1}{\sigma_{\bar U_{\hat r}}} ...
...\sigma_{\bar U_\ast}}\right)
 \right]^5 \!\!\! \;\mathrm{d}{x}
 \right\}^2 \; .$ (12)

This expresses the accuracy of the speller as a function of the characteristics of the score distributions for targets and non-targets obtained from the first stage of the speller. Therefore, in order to use this formula to measure the performance of a particular realisation of Donchin's speller, one needs to empirically identify the means and standard deviation of the outputs of the flash scoring block for that implementation and then feed them into (12). We will postpone this parameter-identification phase until section 6, where we will develop a more accurate model of the first stage which will take target history effects into account.


Generalising the model

While the system described and modelled above is the most typical incarnation of Donchin's speller, also other types of ERP-based speller have been considered in the literature. By using procedures similar to the one we followed to obtain (12), we analysed a variety of such systems and modelled their accuracy. The spellers we considered are listed in the first column of table 1. These include Donchin spellers with different matrix sizes and different features for the distributions of flash scores for targets and non-targets, a sequential speller, and a generic multi-choice selection BCI.

Interestingly, we found that the accuracy of all such systems is represented by the following general equation:

$\displaystyle A =
 \left\{
 \int_{\mathbb{R}}\mathrm{normpdf}\left(\xi\right) \...
...i + \gamma\right)
 \right]^{\eta} \! \;\mathrm{d}{\xi}
 \right\}^{\lambda} \; .$ (13)

For example, it is easy to derive (12) from (13) by performing the substitution $ x=\sigma_{\bar U_{\hat r}}   \xi + \mu_{\bar U_{\hat r}}$ and with an appropriate choice of the constants $ \beta$, $ \gamma$, $ \eta$ and $ \lambda$ (indicated as case (a) in table 1). The values of the constants necessary to derive the accuracy of other spellers are shown in the table. Other more complex BCI systems could similarly be modelled.


Table 1: Parameters to be used in (13) to model the accuracy of different ERP-based brain-computer interfaces
\begin{table}
% latex2html id marker 418
\begin{center}
\scriptsize {
\r...
...
\\
\par
\hline
\end{tabularx}
} % end tablesize
\end{center}
\end{table}



Modelling $h$-related effects

In section 4.2.1 we introduced a simple model of the first stage of Donchin's speller, which, in conjunction with a model of the second stage, was then used to derive an expression for the accuracy of the speller. The focus of this paper is really the second stage of the speller. However, the accuracy of the second stage, and, hence, of the speller, can only be evaluated if the statistical properties of the scores produced by the first stage are fully specified. In this section we want to extend the model of the first stage developed above so as take the modulations of P300s associated with stimulus history documented in section 3 into account. In section 6 we will propose and model a new algorithm for the second stage of the speller which will exploit such effects.


Model of the first stage in presence of $h$-related effects

So far we have modelled the flash scores produced by the first stage of the speller as a normally distributed stochastic variable whose mean depends exclusively on whether or not the flash is a target or a non-target. As discussed above, however, $h$ strongly affects the shape and amplitude of P300s and, so, it is reasonable to expect that this will also affect flash scores for target stimuli.

To take this into account, we propose to modify (4) by creating a separate version of it for each value of $h$ to which significant P300-amplitude changes can be ascribed. In other words we replace it with the following:

$\displaystyle Y_{\mathrm{t}h} = \alpha_{\mathrm{t}h} + \mathring Y$ (14)

where $ h=0,1,...,h_\mathrm{max}$ and the means $ \alpha_{\mathrm{t}h}$ account for the dependence of target flash scores on $h$. Based on the evidence gathered in section 3, we chose $ h_\mathrm{max}=9$. All epochs with $ h \ge 9$ were artificially assigned to class $ h=9$, which we will term ``9+'' in the following to make it easier to remember its contents. We, therefore, obtained ten versions of (14).

While we have no evidence that $h$ significantly affects also the shape and amplitude of the ERPs produced in the presence of non-targets in Donchin speller, in principle, this might happen with particular choices for the shape and timing of the stimuli or in other types of BCI. These situations could easily be modelled by generalising (3) as we did for the targets, e.g., by considering a model of the form

$\displaystyle Y_{\mathrm{nt}h} = \alpha_{\mathrm{nt}h} + \mathring Y$ (15)

In appendix D we will look at possible causes of TTI-related modulations of flash scores in the presence of non-targets and we will explain how the theory and methods developed in the paper would be affected by the use of (15). However, as we will show at the end of section 5.3, this level of sophistication is unnecessary for Donchin speller. So, we will continue to use the simpler (3) to model non-target scores.


Model identification

Based on the choices we made in the previous section, in order to model the first stage of the speller in the presence of $h$ related effects we need to identify 12 parameters: the variance, $ \sigma^2$, of the noise variable, $ \mathring Y$, in (3) and (14); the $ \alpha_{\mathrm{nt}}$ parameter in (3); and the ten $ \alpha_{\mathrm{t}h}$ parameters in (14). Before we can proceed with this task, however, we need to choose a particular implementation of the first stage of the speller.

For the purpose of demonstrating the ideas, we decided to use an existing high-performance approach. More specifically, we borrowed the first stage from the work of Rakotomamonjy and Guigue [29] which won the BCI Competition III for the Donchin speller. Namely, we used an approach they called ``Ensemble SVM without channel selection'' which is easy to implement and outperforms other alternatives when using only 5 repetitions of the 12 stimuli to classify a character. In this approach, an ensemble of classifiers is used. The datasets are split into several subsets and a linear support vector machine (SVM) classifier is trained on each of them. When classifying unseen test data, the outputs of all classifiers are averaged to give a flash score representing the ``targetness'' of the flash.

We analysed the distribution of the flash scores produced by this approach on the BCI Competition data used also in section 3. As in [29], we divided the 85 data blocks of each subject into 17 subsets of 5 blocks each. To obtain realistic generalisation statistics and avoid overfitting problems, we applied a leave-one-out approach estimating the output of the ensemble classifier on a subset of data by averaging the output of 16 classifiers which were not trained using that subset. By repeating this estimation process for each of the 17 subsets we obtained a reliable estimate of the output of the ensemble on the full dataset of each subject.

We partitioned the data into two classes based on whether the corresponding input epochs represented targets (t) or non-targets (nt). We represented the class of a flash as $ \theta \in \{\mathrm{t},\mathrm{nt}\}$. Then, we divided each class into subclasses according to $h$, for $ h \in
\{0,\dots,h_\mathrm{max}\}$. We, therefore, obtained a total of $ 2
\times (h_\mathrm{max}+1)=20$ partitions, each represented by a tuple $ (\theta ,h)$.

We computed the 12 parameters of the model of the first stage described in section 5.1 as follows. First, we evaluated the mean and variance of the flash scores associated with each partition $ (\theta ,h)$ using the 5%-truncated mean, $ a_{\theta
h}$, and 5%-truncated variance, $ s_{\theta h}^2$. These are robust measures of central tendency and spread which are less sensitive to outliers than their non-truncated counterparts. We then computed the average of the variances of the partitions

$\displaystyle s^2 =\frac{1}{2   (h_\mathrm{max}+1)} \sum_\theta \sum_h
 s^2_{\theta h} \;$    

and the average of the means of the partitions representing non-targets

$\displaystyle a_\mathrm{nt} = \frac{1}{h_\mathrm{max}+1} \sum_h a_{\mathrm{nt} h} \;.$    

Finally, we estimated the model's parameters as follows: $ \sigma^2 = s^2$, $ \alpha_\mathrm{nt} = a_\mathrm{nt}$ and $ \alpha_{\mathrm{t}h} =
a_{\mathrm{t}h}$ for $ h=0,...,h_\mathrm{max}$.

We performed this model identification process independently for the two subjects in the dataset II of the BCI Competition III. The resulting parameters are reported in table 2. Note how in both cases $ \alpha_{\mathrm{t}h}$ markedly increases as $h$ increases, as also confirmed by linear regression. For example, the difference in flash score between non-targets and targets for $ h=8$ is twice as much as for $ h=0$. As the classification's accuracy strongly depends on the distance between the two classes, it is clear that epochs with small $h$ (i.e., one target presented shortly after the previous one) represent an unfavourable condition for the speller.


Table 2: Parameters for the model in (3) and (14) estimated from the datasets of the two subjects in the BCI Competition III dataset.
Subject         $ \alpha_{\mathrm{t}0}$     $ \alpha_{\mathrm{t}1}$     $ \alpha_{\mathrm{t}2}$     $ \alpha_{\mathrm{t}3}$     $ \alpha_{\mathrm{t}4}$     $ \alpha_{\mathrm{t}5}$     $ \alpha_{\mathrm{t}6}$     $ \alpha_{\mathrm{t}7}$     $ \alpha_{\mathrm{t}8}$     $ \alpha_{\mathrm{t}9+}$         $ \alpha_\mathrm{nt}$         $ \sigma$    
A         -1.183     -1.188     -0.936     -0.867     -0.730     -0.566     -0.669     -0.588     -0.494     -0.534         -1.846         0.982    
B         -0.824     -0.783     -0.428     -0.490     -0.313     -0.417     -0.165     -0.212     -0.139     -0.198         -1.719         0.876    


Model validation

Having defined the model for the first stage and determined its parameters, we need to assess whether this model fits the actual flash scores resulting from the datasets.

We used a Kolmogorov-Smirnov test to check whether we could reject the hypothesis that the flash scores in each partition could be the result of sampling from the corresponding distributions in (3) and (14) with the parameters in table 2. Results indicated that we cannot reject this hypothesis for 19 out of 20 partitions (we discuss the possible reasons for the exception, partition nt1, later in this section). This confirms the substantial correctness of our assumption that the variables in (3) and (14) are normally distributed and all have the same variance.

To visually evaluate the goodness of the fit between our models and the actual flash scores produced by the ensemble SVM, we show the QQ-plots [37] associated to each $ (\theta ,h)$ partition of each subject in figure 5. They confirm that model fits very well the data in the key part of the distributions, i.e., between the 5th and the 95th percentiles.

Figure 5: QQ-plots for the flash scores in the $ (\theta ,h)$ partitions for subjects A and B in the BCI Competition dataset. For each partition and type of flash the quantiles corresponding to the samples in the partition are plotted against the corresponding quantiles of a standardized normal distribution. The dark thin lines represent the QQ-plots we would expect to see if the scores exactly followed a normal distributions with the parameters listed in table 2. The dots represent actual data points. The horizontal dashed lines represent the 5th, 10th, 25th, 50th, 75th, 90th, 95th percentiles for the standardized normal distribution. So, data points below the bottom dashed line or above the topmost dashed line are in the tails of the distribution.
Subject A
\includegraphics[width=\textwidth]{qqplot_class_out_A}
Subject B
\includegraphics[width=\textwidth]{qqplot_class_out_B}

It is natural to ask why the Kolmogorov-Smirnov test indicated that the data in partition nt1 are unlikely to be the result of sampling from the stochastic variable $ Y_{\mathrm{nt}} = \alpha_{\mathrm{nt}} +
\mathring Y$ (i.e., (3)). Obviously, with the standard confidence level of 5% which we adopted we should expect a statistical test to give incorrect answers on average once in 20 applications simply because of stochastic effects. However, this does not appear to be the main reason for this effect. The most likely cause is that the mean score for the epochs in partition nt1 is slightly smaller than for other non-target partitions, as shown in figure 6 which plots the average score for targets and non-targets for different $h$ values. In D we provide a possible explanation for this anomaly of nt1.

Irrespective of the causes of the minor anomaly of partition nt1, it is clear from the plots in figure 6 that the assumption that non-target score averages are not substantially influenced by $h$ is essentially correct for Donchin's speller. Thus, the adoption of a more sophisticated model (such as (15)), over the simpler (3), is unnecessary.

Figure 6: Average flash scores for targets (circles) and non-targets (squares) as a function of $h$. The target data correspond to the values in table 2.
\includegraphics[width=.49\textwidth]{Regression_TNT_A} \includegraphics[width=.49\textwidth]{Regression_TNT_B}


Exploiting $h$-related effects

In this section we will propose a new classification algorithm for Donchin's speller that exploits the flash-score modulations modelled and documented in section 5. This uses a modified second processing stage (where the system combines the flash scores into character scores and makes a final decision) which, in turn, will require some modification to the model of the accuracy of the speller presented in section 4.2.2.


A new character-scoring algorithm

Let us start by introducing some useful notation. Let $ B$ be a set of integers representing the indices of some events in $ L$ and let $ \mathrm{h}(i, B)$ be a function that returns the number of flashes between the $i$-th flash and the last event in $ B$ preceding such a flash, i.e., $ \mathrm{h}(i,B) = i - \max [B \cap \{1,2, \dots, i-1\} ]
- 1$. We want to apply this function to $ B=\mathcal{R}_r \cup
\mathcal{C}_c$, which represents the set of indices of events where either the row or the column of the character at position $ (r,c)$ in the character-grid of Donchin's speller flashed.

Let us denote by $ \hat
r$ and $ \hat c$ the row and column containing the target character, respectively. Then, if the $i$-th flash is a target, $ \mathrm{h}(i,\mathcal{R}_{\hat r} \cup \mathcal{C}_{\hat c})$ gives the value of $h$ for the $i$-th flash, i.e., the number of non-targets preceding the target flash $i$. Note, however, that in general the coordinates of the target character are unknown to the speller. So, the function $ \mathrm{h}(i,\mathcal{R}_{r} \cup \mathcal{C}_{c})$ should not be interpreted as representing the true $h$ of the $i$-th flash, but as its $h$ value on the hypothesis that the target character is at position $ (r,c)$. To reiterate, only when scoring the target character $ (\hat r, \hat c)$, the function $ \mathrm{h}$ gives the true $h$ value of the $i$-th flash.

To account for the effects of $h$, we propose to modify the scoring formula in (1) by weighing flash scores based on the values returned by the function $ \mathrm{h}(i,\mathcal{R}_{r} \cup \mathcal{C}_{c})$. The scoring function we used is as follows:

\begin{displaymath}\begin{split}
 \bar S_{r,c} &= \frac{1}{J} \sum_{\rho \in \ma...
...mathcal{R}_r\cup \mathcal{C}_c))   (
 y_\chi - b)
 \end{split}\end{displaymath} (16)

where $ w$ is a function that determines the weight to be used when combining flash scores. Note that we also added a parameter, $ b$, which allows us to change the bias of the first stage effectively determining the balance between sensitivity and specificity of the classifier. It is easy to see that (1) is a special case of (16) obtained with the substitutions $ w\equiv
1$ and $ b=0$.

In the version of Donchin's speller considered in this paper, the codomain of the function $ \mathrm{h}(i,\mathcal{R}_{r} \cup \mathcal{C}_{c})$ is the set of integers $ \{0,1,\cdots, 20\}$ (see discussion at the end of section 3). So, the algorithm only evaluates the function $ w$ on points in that set. Thus, to fully specify the behaviour of the algorithm in (16) we need to know the value of the parameter $ b$ and the (codomain) values taken by the function $ w$ for domain values in the set $ \{0,1,\cdots, 20\}$. In other words, our scoring technique has a total of 22 degrees of freedom which need to be fixed before the algorithm can be used. We will do this in section 6.3 by using the theoretical model of (16) developed in the next section.


Modelling the accuracy of the new algorithm

In section 4.2 we modelled the accuracy of the standard Donchin speller disregarding the fact that flash scores are affected by $h$. We now need to modify that model to consider both the influence of $h$ on target-flash scores (which we studied in section 5) and also the use of the weighted scoring function in (16).

The procedure we used to derive the new model follows approximately the same strategy as in section 4.2: we need to define and study two stochastic variables representing the weighted sum of row and column flash scores, respectively, and then work out with which probability the row and column containing the target will have a higher score than any other row and column.

Since the calculations involved are quite laborious, we provide the detailed treatment in A. Here it is sufficient to note that the final model of the accuracy of the speller in the presence of $h$ related effects and using a weighted character-scoring function (16) is also a member of the family defined in (13), as was the case for the ordinary speller, although, naturally, the parameters $ \beta$ and $ \gamma$ are quite different in the two cases (see table 1, rows (a) and (f)).


Model identification/Algorithm optimisation

In principle, the probabilistic model of the accuracy of a Donchin speller controlled by the scoring function (16) developed in the previous section and in A gives us the opportunity of optimising the algorithm's parameters in such a way as to maximise performance. However, in practice, optimally identifying the 22 parameters of the algorithm (and of the corresponding model) from empirical data without incurring in overfitting problems would require very extensive datasets, which are rarely available in BCI. Below we describe two workarounds for this problem and the reasons for our particular choices.

In all the data available to us, it is clear that the effect of $h$ on the P300 saturates for large enough values of $h$ (e.g., see figures 2 and 3). So, our first step to reduce the number of degrees of freedom of the algorithm was to clip the values returned by the function $ \mathrm{h}(i,\mathcal{R}_{r} \cup \mathcal{C}_{c})$ to an upper value of $ h_\mathrm{max}=9$, thereby reducing the domain values of the function $ w$ in (16), and, thus, its degrees of freedom, to 10 from the original 21, without significantly affecting performance. Our second step was to adopt a particularly simple form for $ w$ which has only three degrees of freedom, thereby making a reliable model identification possible even from relatively small datasets. More specifically, we chose $ w$ to be a linear splinedefined by the knots (abscissas) $ [0, 3, 6,
h_\mathrm{max}]$ and the corresponding ordinate values $ [1, 1+c_1,
1+c_2, 1+c_3]$. With these choices, optimising (16) requires finding appropriate values for a parameter vector $ \boldsymbol{q}=[c_1, c_2, c_3, b] \in \mathbb{R}^4$.

We chose this particular form for $ w$ for two further reasons. Firstly, our representation of $ w$ ensures that the standard Donchin speller is part of our search space. It is easy to see this by using the assignment $ \boldsymbol {q}=\boldsymbol {0}$, which corresponds to setting $ b=0$ and all the weights in (16) to 1. In this case, (16) collapses down to (1) and the model of the accuracy of the speller developed in the previous section gives us the accuracy of an ordinary Donchin speller in the presence of $h$ related effects, which, however, are not exploited when scoring characters (a situation depicted in row (g) of table 1). So, if for whatever reason using a weighted scoring function is not a good choice for a particular BCI, an optimisation algorithm can easily discover this and fall back on the standard scoring method. The second reason for considering $ w$ functions in the neighbourhood of the constant function $ w\equiv
1$ is a technical one. As discussed in A.3, with this choice the estimation errors produced by our model due to simplifying assumptions are minimum. So, the optimal parameters for the algorithm obtained by via our analytic approach are more likely to be close to the real global optimum for the speller.


Performance differences between the standard speller and our speller

Having constructed theoretical models of the accuracy of the standard speller and our speller, it is easy to see whether, with an appropriate choice of the parameter vector $ \boldsymbol {q}$, the use of (16) can improve performance w.r.t. using (1). All one has to do to study the relative performance of the two spellers is to evaluate the difference between the values taken by (13) when using the parameters in rows (f) and (g) of table 1, respectively.

For space limitations, here we cannot provide full details on such a comparison: even with the constraints imposed on the new algorithm in section 6.3, we still have a 5-dimensional parameter space: the 4 components of $ \boldsymbol {q}$ plus the repetition number, $ J$. However, below we will look at performance differences in a representative subset of the parameter space.

The contour plots on the left of figure 7 show the difference between the accuracy (percentage of correctly spelled characters) obtained when using a weighted scoring function and the accuracy of the standard Donchin's speller for the two subjects in dataset II of the BCI Competition III. The figure represents the subset of parameter space where $ J=5$, $ c_3 = c_2$ and $ b = b_0 \stackrel{\mathrm{def}}{=} \frac 1 2 (
\alpha_\mathrm{nt} + \frac 1 {h_\mathrm{max}+1} \sum_h
\alpha_{\mathrm{t}h})$, which corresponds to ensuring that the average output for a balanced data set containing the same number of targets and non-targets is zero. With these simplifications we are left with only two degrees of freedom, $ c_1$ and $ c_2$, and we can, therefore, visualise accuracy differences using a plot. Specifically, in figure 7(left, top and bottom) we studied how the performance of the two spellers differed as the parameters $ c_1$ and $ c_2$ took all possible values in the set $ \{-0.3, -0.2, \dots, 2.0\}$.

Interestingly, we see that there are choices for $ c_1$ and $ c_2$, where the use of a weighted scoring function worsens performance. However, an appropriate choice of $ c_1$ and $ c_2$ can produce significant performance improvements: up to approximately 4% for subject A and up to approximately 1% for subject B. Even bigger improvements can be obtained if one allows (as we will in section 7) the independent optimisation of the 4 components of $ \boldsymbol {q}$ and the repetition number, $ J$.


Monte Carlo validation

In section 7 we will indirectly corroborate the veracity of the performance predictions made in the previous section by our model by testing the weighted scoring method experimentally on two independent test sets. However, before we do that, we would like to more directly and independently validate the model of our second stage and confirm its predictions. In this section we will do this via Monte Carlo simulations.

To keep the effort of this expensive form of validation under control and for an easier visualisation of results, we again considered a subset of the 5-dimensional parameter space of the system. Namely, we set $ J=5$, $ c_3 = c_2$ and $ b = b_0$ (as we did in the previous section) and ran a Monte Carlo simulation for each possible pair $ (c_1,c_2)\in\{-0.3, -0.2, \dots,
2.0\}^2$.

The simulation was performed as follows. For each $ (c_1,c_2)$ we generated 100 million simulated sequences of stimuli using the following algorithm:

  1. Select a target character out of the possible 36 (e.g., ``A'', at position $ (1,1)$, in the top left corner of the display);
  2. Generate a vector $ \ell$ containing $ J \times 12 $ components. The vector is constructed by concatenating $ J$ random permutations of $ \{\mathsf{R}_1,\dots,\mathsf{R}_6,\mathsf{C}_1,\dots,\mathsf{C}_6\}$ (e.g., $ \ell=$ $ \mathsf{C}_1$, $ \mathsf{R}_3$, $ \mathsf{R}_5$, $ \mathsf{R}_2$, $ \mathsf{C}_2$, $ \mathsf{R}_1$, $ \mathsf{C}_6$, $ \mathsf{C}_4$, $ \mathsf{R}_6$, $ \mathsf{C}_3$, $ \mathsf{R}_4$, $ \mathsf{C}_5$, $ \mathsf{R}_1$, ...);
  3. Scan $ \ell$, element by element, and label each element, $ \ell_i$, with the corresponding $ (\theta ,h)$ partition, i.e., according to whether $ \ell_i$ is the target row or column and to the number of non-targets preceding it in $ \ell$ (e.g., the labelling of the example sequence shown above when the target is ``A'' is: t $ h_\mathrm{max}$, nt0, nt1, nt2, nt3, t4, nt0, nt1, nt2, nt3, nt4, nt5, t6, ...);
  4. Create a vector $ \boldsymbol{y}$ of size $ J \times 12 $. The $i$-th element of $ \boldsymbol{y}$ is obtained by randomly sampling from the set of flash scores associated with the partition assigned to element $ \ell_i$ in the previous step;
  5. Apply the scoring function in (16) to $ \boldsymbol{y}$ and $ \ell$ (the latter is used to find $r$ and $c$) and check whether the character with highest $ \bar S_{r,c}$ is the target character.
  6. Perform the same as in (5) with the non-weighted scoring function in (1).
By counting how many times the procedures in (5) and (6) correctly classify a character over a large number of repetitions and across all characters, one can precisely estimate the accuracy of the speller with the weighted and non-weighted scoring functions, respectively. Comparing the results one can see the performance difference associated with each particular choice of $ (c_1,c_2)$.

Figure 7(right, top and bottom) shows the results of these Monte Carlo simulations as contour plots, side by side with the results estimated through our analytical model, for our two subjects. Let us analyse these results.

Figure 7: Contour plots of the difference between the accuracy (expressed as percentage of correctly spelled characters) obtained with a weighted scoring function in (16) and the accuracy of the original method. The improvement is plotted as a function of $ c_1$ and $ c_2$, the parameters that modulate the weighting function $ w(h)$. On the left we show the improvement estimated using the mathematical model as described in section 6.4. On the right we show the result obtained through the Monte Carlo simulation described in section 6.5. A square represents the point $ (c_2,c_1)$ of maximum improvement according to the model, while a star is the maximum according to the Monte Carlo simulation. The similarity between the left and right plots is striking, as well as the closeness of the maxima found by the two approaches, suggesting the substantial correctness of our mathematical model.
   Model  Monte Carlo


 \includegraphics[width=.48\textwidth]{perf_c2lin_mod_A}  \includegraphics[width=.48\textwidth]{perf_c2lin_mc_A}


 \includegraphics[width=.48\textwidth]{perf_c2lin_mod_B}  \includegraphics[width=.48\textwidth]{perf_c2lin_mc_B}

Firstly, we should note the striking similarity between the contour plots obtained from the model and those resulting from Monte Carlo simulations. Apart from minor numerical differences, for each subject the shape and number of the contour lines match very closely across plots. Secondly, for both subjects, Monte Carlo simulations confirm that by appropriately weighting the flash scores, i.e., using (16) instead of (1), we can improve the accuracy of the speller considerably. Thirdly, the peaks of the performance improvement in the $ (c_1,c_2)$ plane predicted with the two methods are also extremely close. All of this strongly corroborates the validity of our models.1


Testing the generalisation of weighted scoring functions

Having corroborated the model presented in section 6.2, in this section we will use it to optimise weighted scoring functions so as to improve the accuracy of the speller in two sets of experiments. Tests will be carried out with independent datasets which will not be used in the setting of any of the parameters of the algorithm.

In sections 6.4 and 6.5, for analysis and visualisation purposes, we restricted the exploration of the parameter space to only two degrees of freedom. However, since the model computes estimates of the accuracy of the speller very efficiently, we can evaluate many tentative parameter settings. Therefore, in this section, for each subject we will allow the exploration of the full 5-dimensional parameter space defined by the parameter $ J$ and the four components of $ \boldsymbol {q}$.


Tests with an independent test set from the BCI Competition III

For each subject in dataset II from BCI Competition III and for each value of $ J$, we determined the vector $ \boldsymbol{q}=[c_1, c_2, c_3, b]$ which resulted, according to the model, in the best accuracy for the speller. The parameters of the algorithm/model were determined only using the training data from dataset II. Then the optimal weighting function was tested on the test set of the BCI competition. To reiterate, this set was never used during any phase of the design of the system, nor for the training of the classifier or the weighting function. So, results with this test give a true indication of the generalisation capabilities of the approach.

Table 3 and figure 8 show the results of our tests, reporting the accuracy of the speller on the test set for the two subjects when using the optimised weighted scoring function. We also report the results for the case $ \boldsymbol {q}=\boldsymbol {0}$, i.e., when using the standard character scoring method, which corresponds to the original method in [29]. Let us analyse the results.

Figure 8: Accuracy (percentage of correctly identified characters) of the speller on a test set for the two subjects in the BCI Competition III dataset when using the optimised weighting function (``opt $ \mathbf {q}=0$'') and with the original non-weighted algorithm (`` $ \mathbf {q}=0$'') by Rakotomamonjy and Guigue [29].
\includegraphics[width=9cm]{perf_test_AB}


Table 3: Comparison of the accuracy (%) of the speller when using the optimised weighted scoring (``opt $ \boldsymbol {q}$'') w.r.t. the non-weighted version (`` $ \boldsymbol {q}=\boldsymbol {0}$'').
<#6656#> <#6658#> $ J$
1-3 4-6 7-9
<#6667#> opt $ \boldsymbol {q}$ 35.3 71.7 84.7 93.3 98.0
$ \boldsymbol {q}=\boldsymbol {0}$ 36.3 67.3 82.7 89.7
<#6684#> opt $ \boldsymbol {q}$ 58.0 80.7 92.0 96.0 96.7
$ \boldsymbol {q}=\boldsymbol {0}$ 57.0 78.7 90.0 95.7

For very small or very large values of $ J$ the improvement provided by a weighted scoring function is small or even slightly negative. For short sequences ($ J < 3$), this is probably ascribable to the fact that some of the approximations in the model become less accurate when $ J$ is very small (see A.2). Note that this does not imply that a weighted scoring function would not help with small values of $ J$: it simply means that, for this particular case, the parameters the model suggest for the scoring algorithm are sub-optimal. The weighting has also a small effect for long sequences, where the performance of both scoring techniques reaches 95% ($ J \geq 13$ for A and $ J \geq 11$ for B) and, so, there is little room for improvement.

However, for intermediate values of $ J$, the new method performs markedly better than the method in [29], i.e., the best performing Donchin speller published to date. More specifically, in the range of $ J$ values where the accuracy of the speller is between 70% and 95%, there is an improvement of around 3% on average. This is particularly important because 70-95% is the accuracy range for which the usability of a speller is highest. A speller with an accuracy below 70% can be frustrating for the user [38] and can prevent recipients from understanding messages. Instead, an accuracy exceeding 95% only marginally improves the understandability of messages while markedly increasing the time needed to communicate, which can be equally frustrating.


Test with a further twelve independently acquired subjects

In order to confirm the applicability and benefits of the new approach on a larger and independent group of subjects, we tested the algorithm within our BCI lab on 12 further subjects. In this section we report the details and the results of these experiments.

Each participant was asked to spell a total of 20 characters. Target characters were randomly chosen before the beginning of the session. Each row and column of the standard 6$\times$6 matrix of characters was randomly intensified without replacement for 100 ms with a gap of 50 ms, leading to a SOA of 150 ms. During the spelling of a character, each row and column flashed 10 times, for a total of 120 flashes, between targets and non-targets. During that period, subjects were asked to focus on the target character and to mentally count the number of times it was highlighted.

Subjects were seated comfortably with their neck supported by a C-shaped inflatable travel pillow to reduce muscular artefacts. The eyes were at approximately 80 cm from a 22'' LCD screen with 60 Hz refresh rate. Data were collected from 64 electrode sites using a BioSemi ActiveTwo EEG system. The EEG channels were referenced to the mean of the electrodes placed on either earlobe. Data were sampled at 2048 Hz, filtered and then downsampled by a factor of 8.

Cross validation was used to assess the accuracy of the new algorithm as well as of the standard, non-weighted version of the speller. More specifically, the dataset of each subject was split into 10 subsets, each including the data related to two target characters. One dataset was then used as a test set while the remaining nine formed the training set. Using the software in [39], one ensemble classifier was built by training one SVM on each of the 9 subsets of the training set. Then, using the same leave-one out approach as in section 5.1, we found the parameters $ \alpha_{\mathrm{t}h}$, $ \alpha_\mathrm{nt}$, and $ \sigma$ for the model in (3) and (14). Given these parameters, we used the analytical model in section 6.2, to find the optimal vector of parameters $ \boldsymbol {q}$. We then used the ensemble classifier and the weighted scoring function associated to $ \boldsymbol {q}$ to classify the test subset. For reference, we did the same with the non-weighted scoring function. This procedure was repeated using, in turn, each one of the 10 subsets as test set and the remaining nine as training set. The results were then combined so as to obtain an average accuracy measure across the 20 characters acquired with each subject.


Table 4: Comparison of the accuracy (%) of the speller when using the optimized weighted scoring (``opt $ \boldsymbol {q}$'') w.r.t. the non-weighted version (`` $ \boldsymbol {q}=\boldsymbol {0}$'') with the 12 subjects tested within our lab.
<#6791#> <#6793#> $ J$
1 2 3 4 5 6 7 8
<#6807#> opt $ \boldsymbol {q}$ 45 85 100 95 100 100 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 50 85 90 95 100 100 100 100 100
<#6834#> opt $ \boldsymbol {q}$ 80 100 95 100 100 100 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 70 100 95 100 100 100 100 100 100
<#6861#> opt $ \boldsymbol {q}$ 40 70 90 95 95 95 100 95 95 95
$ \boldsymbol {q}=\boldsymbol {0}$ 45 65 85 95 90 95 95 95 95
<#6888#> opt $ \boldsymbol {q}$ 55 95 95 100 95 100 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 60 85 95 100 95 100 100 100 100
<#6915#> opt $ \boldsymbol {q}$ 65 60 85 80 90 90 100 100 100 95
$ \boldsymbol {q}=\boldsymbol {0}$ 55 60 85 85 95 90 100 100 100
<#6942#> opt $ \boldsymbol {q}$ 70 90 100 100 100 100 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 70 80 95 100 100 100 100 100 100
<#6969#> opt $ \boldsymbol {q}$ 10 20 40 65 85 85 80 85 90 95
$ \boldsymbol {q}=\boldsymbol {0}$ 15 15 40 60 75 70 80 90 85
<#6996#> opt $ \boldsymbol {q}$ 55 90 95 95 100 95 100 100 95 100
$ \boldsymbol {q}=\boldsymbol {0}$ 55 90 95 95 100 95 100 100 95
<#7023#> opt $ \boldsymbol {q}$ 65 90 95 100 100 100 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 70 80 95 100 100 100 100 100 100
<#7050#> opt $ \boldsymbol {q}$ 35 65 90 90 95 95 95 95 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 35 60 90 90 95 95 95 95 100
<#7077#> opt $ \boldsymbol {q}$ 45 65 80 90 95 95 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 45 65 75 90 95 95 100 100 100
<#7104#> opt $ \boldsymbol {q}$ 35 75 90 95 95 95 100 100 100 100
$ \boldsymbol {q}=\boldsymbol {0}$ 45 80 85 95 95 95 95 100 100

Table 4 reports the results of this procedure for the 12 subjects and for 10 different values of the number of repetitions, $ J$. In total there are 120 accuracy figures in the table. These are multiples of 5% because each subject was tested on 20 characters which tends to make small performance differences difficult to resolve. Nonetheless, our new algorithm performs better than the standard speller in 21 cases and when it does so the accuracy improvement is approximately 7.4%, i.e., between one and two characters more out of 20. Conversely, only in 10 cases the standard speller beats our algorithm and when it does so the performance difference is 5.5%, corresponding to approximately one character. Additionally, 6 out of these 10 cases are for $ J=1$, where we already know from the previous section that our model becomes less reliable and so our algorithm is likely to be sub-optimally parametrised. So, overall results are quite encouraging.

As the accuracy figures for different values of $ J$ are strongly correlated, we could not run tests to verify the statistical significance of the data in table 4 using more than one value of $ J$ per subject. However, it seemed inappropriate to choose a particular value of $ J$, since different algorithms may work best for different $ J$'s and also the optimal value of $ J$ (in terms of usability -- more on this below) appears to be subject dependent. Therefore, for statistical testing we decided to take, for each subject, the value of $ J$ for which either method reached at least an accuracy of 80% which provides a reasonable trade-off between accuracy and speed (see discussion at the end of section 7.1). Table 5 reports the value $ J$ chosen with this method and the corresponding accuracy of the two algorithms, for each subject. Note that the data in this table are a subset of the data in table 4.

We ran a paired t-test on the data from table 5. The test confirmed that the accuracy improvement of the weighted scoring algorithm w.r.t. the non-weighted one is statistically significant ($ p$=0.014, $ t$=2.93) with a 95%-confidence interval for the improvement of 1.14-8.03%. In other words, we obtained significant improvements in accuracy with our own subjects as well. They are compatible with the values estimated by our probabilistic model, the Monte Carlo simulations and with the values obtained on the BCI competition test set.

All this evidence points clearly in one direction: significant benefits can be accrued by considering and exploiting the modulations in P300s induced by stimulus history.


Table 5: Accuracy of the speller for the two algorithms choosing the minimum number of repetitions for which either one reaches an accuracy of 80%.
Subject
1 2 3 4 5 6 7 8 9 10 11 12
$ J$ such that acc $ \ge$80% 2 1 3 2 3 2 5 2 2 3 3 3
<#7198#> opt $ \boldsymbol {q}$ 85 80 90 95 85 90 85 90 90 90 80 90
$ \boldsymbol {q}=\boldsymbol {0}$ 85 70 85 85 85 80 75 90 80 90 75 85


Discussion

Amplitude and shape variations in brain waves are often considered harmful for BCIs. However, as we suggested in [3,4], in some cases they carry information which can be exploited to improve performance.

Our system embraces this philosophy, by considering, modelling and exploiting TTI-related P300 modulations. This is done at two levels. In system's second stage (the character scorer), we use different weighting factors for the responses of the first stage to flashes. The weights are based on flash latencies with respect to some earlier reference event, thereby making it possible for the system to adapt to or exploit history effects on brain waves. We also use a model which takes TTI-related P300 modulations into account to optimally set the weights and biases of such a system from a training set of data. The model explicitly represents the modulations in the flash scores produced by the first stage in the presence of TTI variations, translates them into corresponding character scores and integrates the results into a formula which gives the probability of correct classification as a function of the system's parameters.

The good results obtained with this approach suggest that the modulations in P300 related to differences in TTI are not necessarily a problem within the Donchin speller paradigm, provided one takes such variations into account when designing the system, as we did, instead of averaging over them.

Naturally, this is not the only way of improving performance in a Donchin speller. Another alternative is, for example, to modify the protocol in such a way to reduce TTI-related modulations in ERPs. For instance, [40] suggested that performance improvements could be obtained by using stimuli where the letters being flashed were not all in the same row or column. Groups of flashing letters were chosen in such a way as to maximise the distance between the representation of the outputs of the first stage (over multiple repetitions of a complete sequence of flashes) for different characters on the assumption that the first-stage classifier was 100% correct. This particular choice provides the system with error correction capabilities which may help in the recognition of characters. The benefits of this idea were particularly clear when using a new type of stimuli where each letter of the matrix was placed in a grey rectangle with either horizontal or vertical orientation and letters were activated by switching the orientation of the rectangle between these two states. It was suggested that these stimuli may generate psycho-physiological responses which are less susceptible to TTI modulations than the original flashes. It would be interesting to combine such stimuli with our method to see if the two techniques can work synergistically to further improve performance.

An important question regarding performance is whether this is assessed with online or offline data and analyses. It is clear that the ultimate goal should be to evaluate performance by testing a system online, i.e., in exactly the same conditions experienced by the intended users of the system. However, after weighing the benefits and the obstacles associated with testing our system online we decided to postpone an online comparison to a follow-up paper and perform an offline analysis at this stage. The main reason is that with an offline analysis we can perform a comparison of the two algorithms on the exact same datasets, which in turn allows the use of a paired t-test to establish the statistical significance or otherwise of performance differences. This test has a greater statistical power than an unpaired test, thereby making it possible to make statistically sound inferences with much fewer subjects. Testing the two algorithms with online systems would have added a noise factor due to the two algorithms classifying different EEG signals and would have prevented us from using of a paired test.

Recent work [41,42] has shown that the performance of P300-based spellers is significantly affected by whether users gaze at target letters in addition to focusing their attention on them (overt attention) or gaze at a fixation element of the screen and use covert attention to focus on target letters. Performance is significantly higher in the overt attention case. This is most likely because, in addition to the P300 and other endogenous components being modulated, also visual exogenous ERPs are modulated by overt attention (contrarily to what happens with covert attention). As a result, in the case of overt attention classifiers can exploit extra information when making decisions. As suggested in [42], this implies that resulting BCIs are not truly independent and, thus, future work should attempt to create systems which decouple attention from gaze. In such systems it will be even more important to use strategies such as the methods suggested in this paper to make the best use of the information available in endogenous ERPs, including their modulations due to stimulus history.

Finally, we would like to mention, that in recent work BCIs that rely on P300s evoked through different sensory modalities such as spacial auditory [43] or tactile [44] have been proposed. There is no reason to believe the techniques presented in the paper to best exploit visual ERPs could not be applied also to ERPs generated using stimuli presented in a different modality.


Conclusions

In this paper we first documented, modelled and then exploited, within the context of the Donchin speller, a modulation in the amplitude of P300s associated with variations in the number of non-targets preceding a target. In particular, we specialised the system through the use of a new character-scoring function which, with minimal effort, allows it take such modulations of the P300 account.

We mathematically assessed the potential of the approach by means of a statistical model based on few reasonable assumptions, which we later verified. Validation by Monte Carlo simulations showed that the model is accurate.

Testing the new approach on unseen data revealed an average improvement in accuracy of between 2 and 4% over the best classification algorithm for Donchin speller in the literature [29] at the range of sequence lengths which is most suited for practical use of a speller.

The model and method we developed are quite general and can be applied to a wide spectrum of ERP-based BCIs, including the Donchin speller with different matrix sizes, a sequential speller, a generic multi-choice selection BCI, and many others. In future research we intend to explore the possibility of obtaining similar improvements within other BCI paradigms based on P300s, including our own BCI Mouse [4].

The authors would like to thank Mathew Salvaris for his valuable comments and help. We would also like to thank Franciso Sepulveda for his help in revising the manuscript.

This work was supported by the Engineering and Physical Sciences Research Council (UK) under grant EP/F033818/1 ``Analogue Evolutionary Brain Computer Interfaces''.


Modelling the accuracy of the speller based on weighted scoring

In our model of the first stage of Donchin speller in the presence of history effects, we assumed that the distribution of flash scores follows (3) for not targets (see section 4.2.1) and follows (14) for targets (see section 5.1), respectively. In this appendix we will formally derive a model of the second stage so as to assess the accuracy of the speller when making decisions after a sequence of $ J$ stimulus repetitions using the scoring function in (16).


Scoring characters

Let us start by considering a single repetition of 12 flashes -- one for each row and column -- within a longer sequence of repetitions (we will extend the treatment to the multi-repetition case later). Then (16) can be rewritten in statistical terms as

\begin{displaymath}\begin{split}
 S_{r,c}
 &= w(\mathrm{h}(\rho,\mathcal{R}_r\cu...
...\mathcal{R}_r\cup \mathcal{C}_c))  
 (Y_\chi - b)
 \end{split}\end{displaymath} (17)

where $ \rho$ and $ \chi$ satisfy $ L_\rho = \mathsf{R}_r$ and $ L_\chi = \mathsf{C}_c$, respectively. In (17) the randomness is due to $ L$ (through $ \rho$, $ \chi$, $ \mathcal{R}_r$ and $ \mathcal{C}_c$) and to the random variables $ Y_i$, for $ i\in\{\rho,\chi\}$.

If the target character is at position $ (\hat r, \hat c)$, according to (3) and (14) the variables $ Y_i$ can be expressed as:

$\displaystyle Y_i =
 \begin{cases}
 \alpha_{\mathrm{nt}} + \mathring Y & \text{...
...r}\cup \mathcal{C}_{\hat c})} + \mathring Y & \text{otherwise},
 \end{cases}\ $ (18)

where the function $ \mathrm{h}$ was defined at the beginning of section 6.1. Note that the variable $ \mathring Y$ is independently instantiated for each $i$.

To avoid confusion, in the following, every time two independently instantiated variables appear in the same expression, we will use different Roman superscripts to highlight the fact that they are variables with the same distribution, but not the same random variable.

Using (18), (17) becomes

(19)

where $ \textsl{[}\dots\textsl{]}$ are Iverson brackets (i.e., they return 1 if the condition in brackets is satisfied, and 0 otherwise), and we used the definitions $ \alpha^b_{t\;\mathrm{h}(.)} \stackrel{\mathrm{def}}{=}
\alpha_{t\;\mathrm{h}(.)} - b$ and $ \alpha^b_{\mathrm{nt}}\stackrel{\mathrm{def}}{=}\alpha_{\mathrm{nt}}-b$ for conciseness.

Because of the variability of $ \rho$, $ \chi$, $ \mathcal{R}_r$, $ \mathcal{C}_c$, $ \mathcal{R}_{\hat r}$ and $ \mathcal{C}_{\hat c}$, the functions $ \mathrm{h}(\rho,\mathcal{R}_r \cup \mathcal{C}_c))$ and $ \mathrm{h}(\chi,\mathcal{R}_r \cup \mathcal{C}_c))$ are stochastic. For notional convenience we represent them with a new stochastic variable, $ H$, which we need to characterise. To simplify our treatment, we make use of the assumption that the number of flashes occurring between the flashing of two particular rows or columns of interest is independently and identically distributed across epochs. This is a first order approximation because it ignores the constraint that each row and column flash exactly once within each group of 12 flashes. Under this assumption, however, the $ H$ variables associated with different $ \rho$, $ \chi$, $r$, $c$ become i.i.d.

Using $ H$, (19) simplifies. However it takes different forms depending on whether we are scoring a target character, a ``half-target'' (i.e., a character belonging to either the same row or the same column of the target, but not both) or a non-target. Namely:

$\displaystyle \hspace{-2cm}
 S_{\hat r, \hat c}$ % latex2html id marker 7310
$\displaystyle = \mathrm{w}(H^{\Romannum{1}}) \: \l...
...ft[
 \alpha^b_\mathrm{t\;H^{\Romannum{2}}}
 + \mathring Y^{\Romannum{2}}\right]$    (target) (20)
$\displaystyle \hspace{-2cm}
 S_{\hat r,\ast}$ % latex2html id marker 7312
$\displaystyle = \mathrm{w}(H^{\Romannum{3,\ast}}) ...
...}}) \: \left[
 \alpha^b_{\mathrm{nt}}
 + \mathring Y^{\Romannum{3,\ast}}\right]$    (row half-target) (21)
$\displaystyle \hspace{-2cm}
 S_{\ast, \hat c}$ % latex2html id marker 7314
$\displaystyle = \mathrm{w}(H^{\Romannum{5,\ast}}) ...
...ft[
 \alpha^b_\mathrm{t\;H^{\Romannum{2}}}
 + \mathring Y^{\Romannum{2}}\right]$    (col. half-target) (22)
$\displaystyle \hspace{-2cm}
 S_{\ast, \ast}$ % latex2html id marker 7316
$\displaystyle = \mathrm{w}(H^{\Romannum{7,\ast}}) ...
...}}) \: \left[
 \alpha^b_{\mathrm{nt}}
 + \mathring Y^{\Romannum{6,\ast}}\right]$    (non-target) (23)

where the star symbol stands for any non target line, i.e., any $ r
\neq \hat r$ if used as a row index, or any $ c \neq \hat c$ if used as a column index.2

The target character $ (\hat r, \hat c)$ will be correctly identified if and only if the stochastic variable $ S_{\hat r, \hat c}$ takes a value that is greater than the maximum taken by the variables in the set $ \{S_{\hat r,\ast}\} \cup \{S_{\ast, \hat c}\} \cup \{S_{\ast,
\ast}\}$. We want to compute the probability of this happening.

This task is quite difficult. In order to simplify the treatment we will make an approximation which leads to slightly over-estimating the accuracy of the speller. Namely, we will assume that misclassifications can only occur because the score of one of the 10 most likely competitors, the ``half-targets'', is higher than the value taken by $ S_{\hat r, \hat c}$. In other words, we neglect the possibility of one of the 25 non-targets (corresponding to (23)) causing a misclassification.3 We will therefore drop (23) hereafter.

Equations (20)-(22) can be rewritten as

$\displaystyle S_{\hat r, \hat c}$ $\displaystyle = U'_{\hat r} 
 + V'_{\hat c}
 \;,$ (24)
$\displaystyle S_{\hat r,\ast}$ % latex2html id marker 7360
$\displaystyle = \frac{ \mathrm{w}(H^{\Romannum{3,\ast}}) }{ \mathrm{w}(H^{\Romannum{1}}) } \, U'_{\hat r}
 + V'_{\ast}
 \;,$ (25)
$\displaystyle S_{\ast, \hat c}$ % latex2html id marker 7362
$\displaystyle = U'_{\ast}
 + \frac{ \mathrm{w}(H^{\Romannum{6,\ast}}) }{ \mathrm{w}(H^{\Romannum{2}}) } \, V'_{\hat c} \; ,$ (26)

where

$\displaystyle U'_{\hat r}$ % latex2html id marker 7364
$\displaystyle = \mathrm{w}(H^{\Romannum{1}}) \, \left[\alpha^b_\mathrm{t\;H^{\Romannum{1}}}
 + \mathring Y^{\Romannum{1}}\right]\;,$ (27)
$\displaystyle U'_{\ast}$ % latex2html id marker 7366
$\displaystyle = \mathrm{w}(H^{\Romannum{5,\ast}}) \, \left[\alpha^b_{\mathrm{nt}}
 + \mathring Y^{\Romannum{4,\ast}}\right]\;,$ (28)
$\displaystyle V'_{\hat c}$ % latex2html id marker 7368
$\displaystyle = \mathrm{w}(H^{\Romannum{2}}) \, \left[\alpha^b_\mathrm{t\;H^{\Romannum{2}}}
 + \mathring Y^{\Romannum{2}}\right]\;,$ (29)
$\displaystyle V'_{\ast}$ % latex2html id marker 7370
$\displaystyle = \mathrm{w}(H^{\Romannum{4,\ast}}) \, \left[\alpha^b_{\mathrm{nt}}
 + \mathring Y^{\Romannum{3,\ast}}\right]\;$ (30)

are stochastic variables.

When a sequence of $ J$ repetitions is used to score characters and make decisions, the average, $ \bar S_{\hat r, \hat c}$, of $ J$ instantiations of $ S_{\hat r, \hat c}$ has to be compared with the corresponding half-target averages $ \bar S_{\hat r,\ast}$ and $ \bar S_{\ast, \hat c}$. These can be expressed by trivially generalising as follows:

$\displaystyle \bar S_{\hat r, \hat c}$ $\displaystyle = \bar U'_{\hat r} 
 + \bar V'_{\hat c}
 \;,$ (31)
$\displaystyle \bar S_{\hat r,\ast}$ % latex2html id marker 7386
$\displaystyle = \overline{\frac{ \mathrm{w}(H^{\Ro...
...ast}}) }{ \mathrm{w}(H^{\Romannum{1}}) } \, U'_{\hat r}}
 + \bar V'_{\ast}
 \;,$ (32)
$\displaystyle \bar S_{\ast, \hat c}$ % latex2html id marker 7388
$\displaystyle = \bar U'_{\ast}
 + \overline{\frac{...
...}(H^{\Romannum{6,\ast}}) }{ \mathrm{w}(H^{\Romannum{2}}) } \, V'_{\hat c}} \; ,$ (33)

where a horizontal bar over a variable or an expression represents a new stochastic variable which is the average of $ J$ independent instances of that variable or expression.

The purpose of $ \mathrm{w}(h)$ in our scoring equation is to slightly modulate the effect different values of $ Y$ have on the average, according to their $h$. So we will choose weights that are distributed around 1. Under this restriction, it is safe to assume that the value $ r_\mathrm{max}$ such that $ \bar U'_{r_\mathrm{max}} = \bar U'_{\mathrm{max}} = \max\{\bar U'_*\}$ also satisfies $ \bar S_{r_\mathrm{max}, \hat c} = \max\{\bar S_{\ast, \hat c}\}$. In other words, in (33) we allow the second term to partly contribute to the value of the score but we assume that its value is never so big to change which one of the non-target rows has the maximum score.

Under this assumption

$\displaystyle \bar S_{\hat r, \hat c} > \max\{\bar S_{\ast, \hat c}\}
 \quad \L...
...ghtarrow \quad
 \bar U'_{\hat r} + \overline{\Delta U} > \bar U'_{\mathrm{max}}$ (34)

where

% latex2html id marker 7406
$\displaystyle \hspace{-1.5cm}
 \Delta U = V'_{\hat...
...\alpha^b_\mathrm{t\;H^{\Romannum{2}}}
 + \mathring Y^{\Romannum{2}}\right] \; .$    

Similar expressions can be obtained for the columns and the $ V$ variables.

Combining these results we obtain:

$\displaystyle \hspace{-1.5cm}
 \bar S_{\hat r, \hat c} > \max \left\{   \{\bar...
..._{\mathrm{max}} \quad \wedge \quad
 \bar V''_{\hat c} > \bar V''_{\mathrm{max}}$ (35)

where $ \bar U''_{\hat r} = \bar U'_{\hat r} + \overline{\Delta U}$, while $ \bar U''_{\mathrm{max}} = \bar U'_{\mathrm{max}}$ (and likewise for the $ V$ variables).


Probability distributions

We now need to find the distributions of $ \bar U''_{\hat r}$, $ \bar U''_{\mathrm{max}}$, $ \bar V''_{\hat r}$, and $ \bar V''_{\mathrm{max}}$. In the following we will concentrate on the $ U$ variables but results also apply to the $ V$ variables because their distributions are the same.

The variables $ \bar U''_{\hat r}$ and $ \bar U''_{\mathrm{max}}$ depend on $ \bar U'_{\hat r}$, $ \bar U'_{\ast}$, and $ \overline{\Delta
U}$. These, in turn, depend on $ \bar U'_{\ast}$ and $ \bar U'_{\hat r}$, which are averages of $ U'_{\ast}$ and $ U'_{\hat r}$ variables. So, let us start by finding the distributions of $ U'_{\ast}$ and $ U'_{\hat r}$.

Using the theorem in B, we can express the cumulative distribution functions (cdf) of $ U'_\ast$ and $ U'_{\hat r}$ as:

$\displaystyle \mathrm{F}_{U'_{\hat r}}(x)$ $\displaystyle = \sum_{h} p_h \: \mathrm{normcdf}\left(\frac{x - \mathrm{w}(h)
   \alpha^b_{\mathrm{t}\; h}}
 {\mathrm{w}(h)   \sigma}\right)$ (36)
$\displaystyle \mathrm{F}_{U'_\ast}(x)$ $\displaystyle = \sum_{h} p_h \: \mathrm{normcdf}\left(\frac{x - \mathrm{w}(h)   \alpha^b_{\mathrm{nt}}}
 {\mathrm{w}(h)   \sigma}\right) \; .$ (37)

The corresponding probability density functions (pdf) are:

$\displaystyle \mathrm{f}_{U'_{\hat r}}(x)$ $\displaystyle = \sum_{h} p_h \frac{1}{\mathrm{w}(h)   \sigma}  
 \mathrm{norm...
... - \mathrm{w}(h)   \alpha^b_{\mathrm{t}\;h}}
 {\mathrm{w}(h)   \sigma}\right)$ (38)
$\displaystyle \mathrm{f}_{U'_\ast}(x)$ $\displaystyle = \sum_{h} p_h \: \frac{1}{\mathrm{w}(h)   \sigma}  
 \mathrm{n...
...athrm{w}(h)   \alpha^b_{\mathrm{nt}}}
 {\mathrm{w}(h)   \sigma}\right) \;\; .$ (39)

Using the theorem in C, from (36) and (37) we obtain:

$\displaystyle \mu_{U'_{\hat r}}$ $\displaystyle = \sum_{h} p_h \mathrm{w}(h)\: \alpha^b_{\mathrm{t}\;h}$    
$\displaystyle \sigma_{U'_{\hat r}}^2$ $\displaystyle = \sum_{h} p_h (\mathrm{w}(h)^2 \sigma^2 +
 (\mathrm{w}(h) \alpha^b_{\mathrm{t}\;h} - \mu_{U'_{\hat r}})^2)$    
$\displaystyle \mu_{U'_\ast}$ $\displaystyle = \sum_{h} p_h \mathrm{w}(h) \:
 \alpha^b_{\mathrm{nt}}$    
$\displaystyle \sigma_{U'_\ast}^2$ $\displaystyle = \sum_{h} p_h \left(\mathrm{w}(h)^2 \sigma^2 +
 (\mathrm{w}(h) \alpha^b_{\mathrm{nt}} - \mu_{U'_\ast})^2 \right)   .$    

We now want to find the distributions of the averages of $ J$ instances of $ U'_{\ast}$ and $ U'_{\hat r}$, namely $ \bar U'_{\ast}$ and $ \bar U'_{\hat r}$, respectively. We will do so by using the central limit theorem.

According to the central limit theorem, the distribution of the average of a sufficiently large number of i.i.d. random variables can be considered as normal and its mean and variance can be estimated from the mean and variance of the distribution of the original variables. In our case the number of terms averaged ranges from a few to up to 15. These numbers may look too small for the application of the central limit theorem. However, results on the rate of convergence of the central limit theorem based on pseudo-moments (e.g., [45]) indicate that a faster convergence can be obtained when the original distribution is close to a normal distribution, which is the case in our model.

From (38) and (39), it follows that the probability density functions of $ U'_{\hat r}$ and $ U'_\ast$ are sums of scaled and shifted Gaussians. If the differences among the values of $ \mathrm{w}(h)$ as well as the differences among the values of $ \alpha^b_{\mathrm{t}\;h}$ are small compared to $ \sigma$ and the values of $ \mathrm{w}(h)$ and $ \alpha^b_{\mathrm{t}\;h}$ are of the same order of magnitude as $ \sigma$, the Gaussians in (38) and (39) are close and merge into a bell-like distribution which closely resembles a Gaussian. Hence, the central limit theorem gives reasonably accurate approximations even for $ J \leq 15$.

Using the central limit theorem, $ \bar U'_{\hat r}$ and $ \bar U'_\ast$ are normally distributed with parameters: $ \mu_{\bar U'_{\hat r}} = \mu_{U'_{\hat r}}$, $ \sigma_{\bar U'_{\hat r}}^2 = \sigma_{U'_{\hat r}}^2 / J$, $ \mu_{\bar U'_\ast} = \mu_{U'_\ast}$, $ \sigma_{\bar U'_\ast}^2 = \sigma_{U'_\ast}^2 / J$.

Through a procedure similar to that in B, we find that the variable $ \Delta U$ has the following cdf:

$\displaystyle \mathrm{F}_{\Delta U}(x) =
 \sum_{h_1} \sum_{h_2} p_{h_1}   p_{h...
...t\;h_1} }
 {\left[\mathrm{w}(h_1)-\mathrm{w}(h_2)\right]   \sigma} \right) \;.$    

Its mean and variance are

$\displaystyle \hspace{-1.5cm}
 \mu_{\Delta U}$ $\displaystyle =
 \sum_{h_1} \sum_{h_2} p_{h_1}   p_{h_2}  
 \left[\mathrm{w}(h_1)-\mathrm{w}(h_2)\right]   \alpha^b_\mathrm{t\;h_1}$    
$\displaystyle \hspace{-1.5cm}
 \sigma_{\Delta U}^2$ $\displaystyle =
 \sum_{h_1} \sum_{h_2} p_{h_1}   p_{h_2}   \left\{
 \left[\ma...
...h_2)\right]   \alpha^b_\mathrm{t\;h_1} - \mu_{\Delta U}\right\}^2\right\} \; .$    

The considerations made earlier about the central limit theorem apply also to $ \Delta U$. Thus, we find that $ \overline{\Delta
U}$ is approximately Gaussian with parameters $ \mu_{\overline{\Delta U}} = \mu_{\Delta U}$ and $ \sigma_{\overline{\Delta U}}^2 = \sigma_{\Delta U}^2 / J$.

As $ \bar U'_{\hat r}$ and $ \overline{\Delta
U}$ are independent and normally distributed, then also $ \bar U''_{\hat r} = \bar U'_{\hat r} + \overline{\Delta U}$ is normally distributed. Its parameters are: $ \mu_{\bar U''_{\hat r}} = \mu_{\bar U'_{\hat r}} + \mu_{\overline{\Delta U}}$ and $ \sigma_{\bar U''_{\hat r}}^2 =\sigma_{\bar U'_{\hat r}}^2 + \sigma_{\overline{\Delta U}}^2$.

For the non targets, the distribution of $ \bar U''_{\mathrm{max}} = \bar U'_{\mathrm{max}} = \max\{\bar U'_*\}$ can be found using the same procedure as in section 4.2 obtaining $ \mathrm{F}_{\bar U''_{\mathrm{max}}}(x) = \left[\mathrm{F}_{\bar U'_\ast}(x)\right]^5$.


Model of Donchin speller accuracy in the presence of $h$-related effects

We can find the probability that $ \bar U''_{\hat r} > \bar U''_{\mathrm{max}}$ using the same procedure as in section 4.2 obtaining the equivalent of (11).

The very last step is to derive the probability of the conjunction on the right in (35): $ \mathrm{Pr}\left(\bar U''_{\hat r} > \bar U''_{\mathrm{max}} \:
\wedge \: \bar V''_{\hat c} > \bar V''_{\mathrm{max}}\right)$. Even if the statement (35) is formally identical to (9) with $ \bar U''_{\hat r}$ replacing $ \bar U_{\hat r}$ and $ \bar U''_{\hat r}$ replacing $ \bar U_{\hat r}$, we cannot directly write the equivalent of (10) because $ \bar U''_{\hat r}$ and $ \bar V''_{\hat c}$ are not independent. In fact, they both depend on both $ \bar U'_{\hat r}$ and $ \bar V'_{\hat c}$. Their dependence is through $ \Delta U$ and $ \Delta V$ whose means $ \mu_{\Delta U}$ and $ \mu_{\Delta V}$ (as shown in A.2) contain the factor $ \left[\mathrm{w}(h_1)-\mathrm{w}(h_2)\right] \alpha^b_\mathrm{t\;h_1}$. However, if we restrict the search space to values of $ \mathrm{w}(h)$ and $ b$ (which affects $ \alpha^b_\mathrm{t\;h}$) such that $ \mathrm{w}(h)$ is never too different from 1 and $ \alpha^b_\mathrm{t\;h}$ is not excessively large, we can assume the independence of $ \bar U''_{\hat r}$ and $ \bar V''_{\hat c}$.

Under this assumption we can approximate $ \mathrm{Pr}\left(\bar U''_{\hat r} > \bar U''_{\mathrm{max}} \:
\wedge \: \bar V''_{\hat c} > \bar V''_{\mathrm{max}}\right)$ as $ \mathrm{Pr}\left(\bar U''_{\hat r} > \bar U''_{\mathrm{max}}\right)^2$ obtaining:

\begin{displaymath}\begin{split}
 \hspace{-1cm} A = 
 &\mathrm{Pr}\left(\bar S_{...
... \right]^5 \!\!\! \;\mathrm{d}{x}
 \right\}^2 \; .
 \end{split}\end{displaymath} (40)

This equation models the accuracy of the Donchin speller in the presence of $h$-related effects, when (16) is used to score the characters. It is a special case of (13) with $ \beta = \frac{\sigma_{\bar U''_{\hat r}}}{\sigma_{\bar U''_\ast}}$, $ \gamma = \frac{\mu_{\bar U''_{\hat r}} - \mu_{\bar U''_\ast}}{\sigma_{\bar U''_\ast}}$, $ \eta = 5$ and $ \lambda = 2$ (case (f) in table 1).


Probability distribution of a function of a discrete and a continuous random variable

Theorem 1   Let us consider two independent random variables, $ H$ and $ Y$, and the function $ \phi(h,y)$. Let $ H$ be a discrete variable with values in $ {\cal H} = \{ h_1, \dots, h_N\}$ and probability masses $ p_i = \mathrm{Pr}\left(H=h_i\right)$. Let $ Y$ be a continuous variable with cumulative distribution function $ \mathrm{F}_{Y}(y)$. If $ \phi_i(y)=\phi(h_i,y)$ is strictly increasing in $ y$ for all $ i \in \{1,\dots,N\}$, then the cumulative distribution function of $ U=\phi(H,Y)$ is $ \mathrm{F}_{U}(u) = \sum_i p_i   \mathrm{F}_{Y}(\phi^{-1}_i(u))$.

Proof. Given a generic proposition $ {\cal P}$ which depends on the variables $ H$ and $ Y$, the probability of $ {\cal P}$ being true is given by

$\displaystyle \mathrm{Pr}\left({\cal P}\right) = \iint_{\Gamma} \mathrm{f}_{H,Y}(h,y)   dh   dy \;,$    

where $ \mathrm{f}_{H,Y}(h,y)$ is the joint probability density function of $ H$ and $ Y$ and $ \Gamma$ is the region where $ {\cal P}$ holds.

Therefore, $ \mathrm{F}_{U}(u)=\mathrm{Pr}\left(U \le u\right)$ can be expressed as

$\displaystyle \mathrm{F}_{U}(u) = \mathrm{Pr}\left(\phi(H,Y) \le u\right) = \iint_{\Gamma} \mathrm{f}_{H,Y}(h,y)   dh   dy \;,$ (41)

where $ \Gamma$ is the region of the $h$-$ y$ plane where $ \phi(h,y) \le u$.

The independence of $ H$ and $ Y$, and the fact that $ H$ is discrete, imply that

$\displaystyle \mathrm{f}_{H,Y}(h,y) = \mathrm{f}_{Y}(y) \: \sum_i p_i   \delta(h_i - h) \;,$    

where $ \delta(x)$ is the Dirac delta function and $ \mathrm{f}_{Y}(y)$ is the probability density function of $ Y$.

Using Iverson brackets (see A.1) we can rewrite (41) as

\begin{displaymath}\begin{split}
 \mathrm{F}_{U}(u) &= \int_{\mathbb{R}} \int_{\...
... u\textsl{]}   \mathrm{f}_{Y}(y)   dy   dh \; .
 \end{split}\end{displaymath} (42)

The outer integral performs the convolution between the Dirac function and the inner integral. As

$\displaystyle \int_{\mathbb{R}} \delta(t - \tau)   \psi(\tau)   d\tau = \psi(t) \;,$    

(42) is equivalent to

$\displaystyle \mathrm{F}_{U}(u) = \sum_i p_i \int_{\mathbb{R}} \textsl{[}\phi(h_i,y) \le u\textsl{]}   \mathrm{f}_{Y}(y)   dy \; .$ (43)

Each $ h_i$ corresponds to a function $ \phi_i(y)=\phi(h_i,y)$. As $ \phi_i(y)$ is strictly increasing, its inverse exists and is increasing, too. Therefore,

$\displaystyle \phi(h_i,y) \le u \quad \Longleftrightarrow \quad y \le \phi^{-1}_i(u) \; .$    

Then, (43) becomes

$\displaystyle \mathrm{F}_{U}(u) = \sum_i p_i \int_{\mathbb{R}} \textsl{[}y \le ...
..., dy =
 \sum_i p_i   \mathrm{F}_{Y}(\phi^{-1}_i(u)),
 % \label{eqn:cumdist4}
$    

which proves the theorem. $ \qedsymbol$


Mean and variance of a variable whose probability density function (pdf) is the weighted sum of several pdfs

Theorem 2   Let us consider a random variable, $ X$, with probability density function

$\displaystyle \mathrm{f}_{}(x) = \sum_i   p_i   \mathrm{f}_{i}(x) \; .$    

Let $ \mu_i$ be the mean and $ \sigma^2_i$ the variance of a variable whose pdf is $ \mathrm{f}_{i}(x)$. Then, the mean $ \mu$ and the variance $ \sigma^2$ of $ X$ are:

$\displaystyle \mu$ $\displaystyle = \sum_i p_i   \mu_i$ (44)
$\displaystyle \sigma^2$ $\displaystyle = \sum_i p_i   [\sigma^2_i + (\mu_i-\mu)^2 ]   .$ (45)

Proof. From the definition of mean it follows that

$\displaystyle \mu = \int_{\mathbb{R}}x   \mathrm{f}_{}(x)   dx = \int_{\mathb...
...m_i p_i   \int_{\mathbb{R}}x \mathrm{f}_{i}(x)   dx = \sum_i p_i   \mu_i \;,$    

which proves (44).

From the definition of variance, we have

   

which proves (45). $ \qedsymbol$


Explaining and modelling TTI-related modulations of flash scores produced by non-target stimuli

In section 5.3 we found that the average flash score for ERPs in partition nt01 (which gathers the responses to the second non-target following a target) was lower than for other non-target partitions. In this appendix we look at possible causes of TTI-related modulations of flash scores in the presence of non-targets and how the theory and methods developed in the paper would be affected by the use of (15).

When stimuli are presented at a sufficiently fast pace, the P300 produced in response to a target stimulus will appear within the epochs corresponding to the presentation of one or more non-targets following that target thereby deforming ERPs. Naturally, not all such deformations will influence a flash scorer. However, some may do so.

The SVM classifier used in the first stage of our Donchin speller weighs the contribution of the different EEG samples in order to maximise the distance between targets and non-targets. For simplicity let us think of it as a kind of matched filter looking for P300-like shapes, i.e., positive deflections of the EEG w.r.t. the baseline in a time window of around 300-500 ms after stimulus presentation. With the standard stimulus presentation rate of Donchin's speller, for the epochs in the nt01 partition, the P300 induced by the preceding target (the target epoch that started $ 2 \times \mathrm{SOA} =
350 \mathrm{ms}$ before an nt01 epoch) shifts positively the baseline of the nt01 epochs thereby significantly deforming the ERP and making it less likely to produce a high flash score. Figure 9 illustrates the idea using a simple synthetic signal and an elementary classifier. We believe that this is the most likely reason for nt01 having more negative scores than other non-target partitions in our tests.

Figure: Possible explanation for nt01 being more negative than the other non-targets. For simplicity, let us imagine no ERPs are produced by non-target stimuli. When a target flash is presented to a subject, we observe a positive deflection of the EEG signal. This positive deflection occurs approximately 300-500 ms after the beginning of the epoch starting when the target event occurs. For simplicity let us imagine that our flash scorer simply returns the difference between the EEG at time 400 ms and the EEG at time 0 ms (relative to the start of the epoch). With our stylised EEG signal, this algorithm tends to give a positive score to target epochs and a zero score to non-target ones. In the case of nt01, however, the EEG at time 0 ms in an epoch corresponds to the EEG at time $ 2 \times \mathrm{SOA} =
350 \mathrm{ms}$ of the target epoch acquired two flashes earlier. This causes the voltage at 0 ms to be significantly higher than that at 400 ms. In the presence of such an unusual non-target ERP, our simple difference-based scorer will give nt01 epochs an unusually low score.
\includegraphics[width=.4\textwidth]{WhyNT01Diff}

Of course, when using a different timing for stimuli, one might find that other non-target partitions are affected by this effect. The effect may also depend on the particular choice of stimulus paradigm and flash scoring algorithm. So, we cannot exclude that in different P300 based BCIs a more marked TTI modulation of non-target ERPs might be observed. For these reasons it is interesting to consider how the theory and algorithms presented in this paper would be modified if we wanted to also model and exploit such modulations.

Building a model where the score distribution of the non-targets depends on $h$ would be conceptually straightforward and would pose no new challenges. The model for non-targets could be expressed using (15). Apart from small formal changes in the formulae, the main substantial difference is that (21) and (22) should be rewritten as

% latex2html id marker 7749
$\displaystyle S_{\hat r,\ast}
 = \mathrm{w}(H^{\Ro...
...pha^b_{\mathrm{nt}\;H^{\Romannum{9}}}
 + \mathring Y^{\Romannum{3,\ast}}\right]$ (46)

and

% latex2html id marker 7751
$\displaystyle S_{\ast, \hat c}
 = \mathrm{w}(H^{\R...
...\alpha^b_\mathrm{t\;H^{\Romannum{2}}}
 + \mathring Y^{\Romannum{2}}\right] \, ,$ (47)

where the variables % latex2html id marker 7753
$ H^{\Romannum{9}}$ and % latex2html id marker 7755
$ H^{\Romannum{10}}$ determine which values of $ \alpha_\mathrm{nt\;h}$ to use. Please note that, unlike the target case where the same $ H$ selects both the value of $ \alpha_\mathrm{t\;h}$ and $ w(h)$, in this case the pair of variables % latex2html id marker 7765
$ H^{\Romannum{4,\ast}}$ and % latex2html id marker 7767
$ H^{\Romannum{9}}$ and the pair of variables % latex2html id marker 7769
$ H^{\Romannum{5,\ast}}$ and % latex2html id marker 7771
$ H^{\Romannum{10}}$ are different i.i.d. variables.

The second term in (46) was called $ V'_{\ast}$ in A. Earlier its mean, $ \mu_{V'_\ast}$, was found to be % latex2html id marker 7777
$ \mathrm{E}[\mathrm{w}(H^{\Romannum{4,\ast}})] \: \alpha^b_{\mathrm{nt}}$. When using (46), % latex2html id marker 7779
$ \mu_{V'_\ast}=
\mathrm{E}[\mathrm{w}(H^{\Romannum{4,\ast}}) \: \alpha^b_{\mathrm{nt}\;H^{\Romannum{9}}}]$. But, as % latex2html id marker 7781
$ H^{\Romannum{4,\ast}}$ and % latex2html id marker 7783
$ H^{\Romannum{9}}$ are independent, this reduces to the product % latex2html id marker 7785
$ \mathrm{E}[\mathrm{w}(H^{\Romannum{4,\ast}})]
\times \mathrm{E}[\alpha^b_{\mathrm{nt}\;H^{\Romannum{9}}}]$, where the second factor is the average non-target score among the partitions, i.e., $ \alpha^b_{\mathrm{nt}}$. In other words, even with a more sophisticated model of non-target scores, % latex2html id marker 7789
$ \mu_{V'_\ast}=\mathrm{E}[\mathrm{w}(H^{\Romannum{4,\ast}})] \:
\alpha^b_{\mathrm{nt}}$. The same applies to the first term in (47). This shows that a model that assumes the dependence of the non-target score distribution on $h$ would make the mathematical formulation more complicated without giving any substantial advantage.

References

Bibliography

url urlprefix

1
Dornhege G, del R Millán J, Hinterberger T, McFarland D J and Müller K R (eds) 2007 Toward Brain-Computer Interfacing (The MIT Press)

2
Allison B Z, Wolpaw E W and Wolpaw J R 2007 Brain-computer interface systems: progress and prospects. Expert review of medical devices 4 463-474

3
Cinel C, Poli R and Citi L 2004 Possible sources of perceptual errors in P300-based speller paradigm Biomedizinische technik 49(1) 39-40 Proceedings of 2nd International BCI workshop and Training Course

4
Citi L, Poli R, Cinel C and Sepulveda F 2008 P300-based BCI mouse with genetically-optimized analogue control IEEE transactions on neural systems and rehabilitation engineering 16 51-61

5
Hansen J C 1983 Separation of overlapping waveforms having known temporal distributions. Journal of neuroscience methods 9 127-139

6
Zhang J 1998 Decomposing stimulus and response component waveforms in ERP Journal of neuroscience methods 80 49-63

7
Spencer K M 2004 Averaging, detection and classification of single-trial ERPs Event-related potentials. A method handbook ed Handy T C (MIT Press) chap 10

8
Luck S J 2005 An introduction to the event-related potential technique (Cambridge, Massachusetts: MIT Press)

9
Poli R, Cinel C, Citi L and Sepulveda F 2010 Reaction-time binning: a simple method for increasing the resolving power of ERP averages Psychophysiology 47(3) 467-485

10
Citi L, Poli R and Cinel C 2009 High-significance averages of event-related potential via genetic programming Genetic Programming Theory and Practice VII Genetic and Evolutionary Computation ed Riolo R L, O'Reilly U M and McConaghy T (Ann Arbor: Springer) pp 135-157

11
Farwell L A and Donchin E 1988 Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and clinical neurophysiology 70 510-523

12
Allison B Z, McFarland D J, Schalk G, Zheng S D, Jackson M M and Wolpaw J R 2008 Towards an independent brain-computer interface using steady state visual evoked potentials. Clinical neurophysiology 119 399-408

13
Hong B, Guo F, Liu T, Gao X and Gao S 2009 N200-speller using motion-onset visual response. Clinical neurophysiology 120 1658-1666

14
Donchin E and Coles M G H 1988 Is the P300 a manifestation of context updating? Behavioral and brain sciences 11 355-372

15
Polich J 2004 Neuropsychology of P3a and P3b: A theoretical overview Brainwaves and mind: recent developments ed Moore N C and Arikan K (Kjellberg Inc.) pp 15-29

16
Polich J 2007 Updating P300: an integrative theory of P3a and P3b. Clinical neurophysiology 118 2128-2148

17
Polich J and Kok A 1995 Cognitive and biological determinants of P300: an integrative review. Biological psychology 41 103-146

18
Hagen G F, Gatherwright J R, Lopez B A and Polich J 2006 P3a from visual stimuli: task difficulty effects. International journal of psychophysiology 59 8-14

19
Sellers E W, Krusienski D J, McFarland D J, Vaughan T M and Wolpaw J R 2006 A P300 event-related potential brain-computer interface (BCI): the effects of matrix size and inter stimulus interval on performance. Biological psychology 73 242-252

20
Allison B Z and Pineda J A 2003 ERPs evoked by different matrix sizes: implications for a brain computer interface (BCI) system. IEEE transactions on neural systems and rehabilitation engineering 11 110-113

21
Polich J 1990 Probability and inter-stimulus interval effects on the P300 from auditory stimuli. International journal of psychophysiology 10 163-170

22
Fitzgerald P G and Picton T W 1981 Temporal and sequential probability in evoked potential studies. Canadian journal of psychology 35 188-200

23
Allison B Z and Pineda J A 2006 Effects of SOA and flash pattern manipulations on ERPs, performance, and preference: implications for a BCI system. International journal of psychophysiology 59 127-140

24
Gonsalvez C L and Polich J 2002 P300 amplitude is determined by target-to-target interval. Psychophysiology 39 388-396

25
Croft R J, Gonsalvez C J, Gabriel C and Barry R J 2003 Target-to-target interval versus probability effects on P300 in one- and two-tone tasks. Psychophysiology 40 322-328

26
Squires K C, Wickens C, Squires N K and Donchin E 1976 The effect of stimulus sequence on the waveform of the cortical event-related potential. Science (New York, N.Y.) 193 1142-1146

27
Gonsalvez C J, Gordon E, Anderson J, Pettigrew G, Barry R J, Rennie C and Meares R 1995 Numbers of preceding nontargets differentially affect responses to targets in normal volunteers and patients with schizophrenia: a study of event-related potentials. Psychiatry research 58 69-75

28
Bonala B, Boutros N N and Jansen B H 2008 Target probability affects the likelihood that a P300 will be generated in response to a target stimulus, but not its amplitude. Psychophysiology 45 93-99

29
Rakotomamonjy A and Guigue V 2008 BCI competition III: dataset II- ensemble of SVMs for BCI P300 speller IEEE transactions on bio-medical engineering 55 1147-1154

30
Martens S M M, Hill N J, Farquhar J and Schölkopf B 2009 Overlap and refractory effects in a brain-computer interface speller based on the visual P300 event-related potential. Journal of neural engineering 6 026003

31
Citi L, Poli R and Cinel C 2009 Exploiting P300 amplitude variations can improve classification accuracy in Donchin's BCI speller 4th International IEEE EMBS Conference on Neural Engineering (Antalya) pp 478-481

32
Salvaris M and Sepulveda F 2009 Perceptual errors in the Farwell and Donchin matrix speller 4th International IEEE EMBS Conference on Neural Engineering (Antalya, Turkey) pp 275 - 278

33
Blankertz B, Müller K R, Krusienski D J, Schalk G, Wolpaw J R, Schlögl A, Pfurtscheller G, del R Millán J, Schröder M and Birbaumer N 2006 The BCI competition III: Validating alternative approaches to actual BCI problems. IEEE transactions on neural systems and rehabilitation engineering 14 153-159

34
Kaper M, Meinicke P, Grossekathoefer U, Lingner T and Ritter H 2004 BCI competition 2003-data set IIb: support vector machines for the P300 speller paradigm. IEEE transactions on bio-medical engineering 51 1073-1076

35
Guan C, Thulasidas M and Wu J 2004 High performance P300 speller for brain-computer interface IEEE international workshop on biomedical circuits and systems pp S3/5/INV-S3/13-16

36
Bayliss J D 2003 Use of the evoked potential P3 component for control in a virtual apartment. IEEE transactions on neural systems and rehabilitation engineering 11 113-116

37
Wilk M B and Gnanadesikan R 1968 Probability plotting methods for the analysis of data. Biometrika 55 1-17

38
Kübler A, Mushahwar V K, Hochberg L R and Donoghue J P 2006 BCI meeting 2005-workshop on clinical issues and applications. IEEE transactions on neural systems and rehabilitation engineering 14 131-134

39
Rakotomamonjy A and Guigue V online at http://asi.insa-rouen.fr/enseignants/ arakotom/code/bciindex.html

40
Hill J, Farquhar J, Martens S M, Biessmann F and Schölkopf B 2008 Effects of stimulus type and of Error-Correcting code design on BCI speller performance The 22nd Annual Conference on Neural Information Processing Systems (Vancouver)

41
Brunner P, Joshi S, Briskin S, Wolpaw J, Bischof H and Schalk G 2010 Does the ``P300'' speller depend on eye gaze? Tools for Brain-Computer Interaction (TOBI) Workshop on Integrating brain-computer interfaces with conventional assistive technology (Graz) (poster)

42
Treder M S and Blankertz B 2010 (C)overt attention and visual speller design in an ERP-based Brain-Computer Interface Behavioral and brain functions 6 28

43
Schreuder M, Blankertz B and Tangermann M 2010 A new auditory multi-class Brain-Computer Interface paradigm: spatial hearing as an informative cue PLoS One 5 e9813

44
Brouwer A M and van Erp J B F 2010 A tactile P300 Brain-Computer Interface. Frontiers in neuroscience 4 19

45
Ulyanov V V 1981 An estimate for the rate of convergence in the central limit theorem in a real separable Hilbert space Mathematical notes 29 78-82 translated from Matematicheskie Zametki, Vol. 29, No. 1, pp. 145-153, January, 1981.

About this document ...

Documenting, modelling and exploiting P300 amplitude changes due to variable target delays in Donchin's speller

This document was generated using the LaTeX2HTML translator Version 2008 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 1 history.tex

The translation was initiated by Riccardo Poli on 2011-08-01

Riccardo Poli 2011-08-01