{"title": "Dependent Dirichlet Process Spike Sorting", "book": "Advances in Neural Information Processing Systems", "page_first": 497, "page_last": 504, "abstract": "In this paper we propose a new incremental spike sorting model that automatically eliminates refractory period violations, accounts for action potential waveform drift, and can handle appearance\" and \"disappearance\" of neurons. Our approach is to augment a known time-varying Dirichlet process that ties together a sequence of infinite Gaussian mixture models, one per action potential waveform observation, with an interspike-interval-dependent likelihood that prohibits refractory period violations. We demonstrate this model by showing results from sorting two publicly available neural data recordings for which the a partial ground truth labeling is known.\"", "full_text": "Dependent Dirichlet Process Spike Sorting\n\nJan Gasthaus, Frank Wood, Dilan G\u00a8or\u00a8ur, Yee Whye Teh\n\nGatsby Computational Neuroscience Unit\n\n{j.gasthaus, fwood, dilan, ywteh}@gatsby.ucl.ac.uk\n\nUniversity College London\nLondon, WC1N 3AR, UK\n\nAbstract\n\nIn this paper we propose a new incremental spike sorting model that automatically\neliminates refractory period violations, accounts for action potential waveform\ndrift, and can handle \u201cappearance\u201d and \u201cdisappearance\u201d of neurons. Our approach\nis to augment a known time-varying Dirichlet process that ties together a sequence\nof in\ufb01nite Gaussian mixture models, one per action potential waveform observation,\nwith an interspike-interval-dependent likelihood that prohibits refractory period\nviolations. We demonstrate this model by showing results from sorting two publicly\navailable neural data recordings for which a partial ground truth labeling is known.\n\n1 Introduction\n\nSpike sorting (see [1] and [2] for review and methodological background) is the name given to the\nproblem of grouping action potentials by source neuron. Generally speaking, spike sorting involves\na sequence of steps; 1) recording the activity of an unknown number of neurons using some kind\nof extra-cellular recording device, 2) detecting the times at which action potentials are likely to\nhave occurred, 3) slicing action potential waveforms from the surrounding raw voltage trace where\naction potentials were posited to have occurred, 4) (often) performing some kind of dimensionality\nreduction/feature extraction on the set of collected action potential waveform snippets, 5) running\na clustering algorithm to produce grouping of action potentials attributed to a single neuron, and\n\ufb01nally 6) running some kind of post hoc algorithm that detects refractory period violations and thins\nor adjusts the clustering results accordingly.\nNeuroscientists are interested in arriving at the optimal solution to this problem. Towards this end they\nhave traditionally utilized maximum likelihood clustering methods such as expectation maximization\nfor \ufb01nite Gaussian mixture models with cross-validation model selection. This of course allows them\nto arrive at an optimal solution, but it is dif\ufb01cult to say whether or not it is the optimal solution, and it\naffords them no way of establishing the level of con\ufb01dence they should have in their result. Recently\nseveral groups have suggested a quite different approach to this problem which eschews the quest\nfor a single optimal solution in favor of a Bayesian treatment of the problem [3, 4, 5, 6]. In each of\nthese, instead of pursuing the optimal sorting, multiple sortings of the spikes are produced (in fact\nwhat each model produces is a posterior distribution over spike trains). Neural data analyses may\nthen be averaged over the resulting spike train distribution to account for uncertainties that may have\narisen at various points in the spike sorting process and would not have been explicitly accounted for\notherwise.\nOur work builds on this new Bayesian approach to spike sorting; going beyond them in the way steps\n\ufb01ve and six are accomplished. Speci\ufb01cally we apply the generalized Polya urn dependent Dirichlet\nprocess mixture model (GPUDPM) [7, 8] to the problem of spike sorting and show how it allows us\nto model waveform drift and account for neuron appearance and disappearance. By introducing a\ntime dependent likelihood into the model we are also able to eliminate refractory period violations.\n\n1\n\n\fThe need for a spike sorting approach with these features arises from several domains. Waveform\nnon-stationarities either due to changes in the recording environment (e.g. movement of the electrode)\nor due to changes in the \ufb01ring activity of the neuron itself (e.g. burstiness) cause almost all current\nspike sorting approaches to fail. This is because most pool waveforms over time, discarding the time\nat which the action potentials were observed. A notable exception to this is the spike sorting approach\nof [9], in which waveforms were pooled and clustered in short \ufb01xed time intervals. Multiple Gaussian\nmixture models are then \ufb01t to the waveforms in each interval and then are pruned and smoothed\nuntil a single coherent sequence of mixture models is left that describes the entire time course of the\ndata. This is accomplished by using a forward-backward-like algorithm and the Jenson-Shannon\ndivergence between models in consecutive intervals. Although very good results can be produced\nby such a model, using it requires choosing values for a large number of parameters, and, as it is a\nsmoothing algorithm, it requires the entire data set to have been observed already.\nA recent study by [10] puts forward a compelling case for online spike sorting algorithms that can\nhandle waveform non-stationarity, as well as sudden jumps in waveform shape (e.g. abrupt electrode\nmovements due to high acceleration events), and appearance and disappearance of neurons from the\nrecording over time. This paper introduces a chronical recording paradigm in which a chronically\nimplanted recording device is mated with appropriate storage such that very long term recordings\ncan be made. Unfortunately as the animal being recorded from is allowed its full range of natural\nmovements, accelerations may cause the signal characteristics of the recording to vary dramatically\nover short time intervals. As such data theoretically can be recorded forever without stopping,\nforward-backward spike sorting algorithms such as that in [9] are ruled out. As far as we know our\nproposed model is the only sequential spike sorting model that meets all of the requirements of this\nnew and challenging spike sorting problem,\nIn the next sections we review the GPUDPM on which our spike sorting model is based, introduce\nthe speci\ufb01cs of our spike sorting model, then demonstrate its performance on real data for which a\npartial ground truth labeling is known.\n\n2 Review\n\nOur model is based on the generalized Polya urn Dirichlet process mixture model (GPUDPM)\ndescribed in [7, 8]. The GPUDPM is a time dependent Dirichlet process (DDP) mixture model\nformulated in the Chinese restaurant process (CRP) sampling representation of a Dirichlet process\nmixture model (DPM). We will \ufb01rst very brie\ufb02y review DPMs in general and then turn to the speci\ufb01cs\nof the GPUDPM.\nDPMs are a widely used tool for nonparametric density estimation and unsupervised learning in\nmodels where the true number of latent classes is unknown. In a DPM, the mixing distribution G is\ndistributed according to a DP with base distribution G0, i.e.\n\nG|\u03b1, G0 \u223c DP(\u03b1, G0)\n\n\u03b8i|G \u223c G\nxi|\u03b8i \u223c F (\u03b8i)\n\n(1)\n\n(2)\n\nPlacing a DP prior over G induces a clustering tendency amongst the \u03b8i. If \u03b8i takes on K distinct\nvalues \u03c61, . . . , \u03c6K, we can write out an equivalent model using indicator variables ci \u2208 {1, . . . , K}\nthat assigns data points to clusters. In this representation we track the distinct \u03c6k drawn from G0 for\neach cluster, and use the Chinese restaurant process to sample the conditional distributions of the\nindicator variables ci\n\nP (ci = k|c1, . . . , ci\u22121) =\nP (ci 6= cj for all j < i|c1, . . . , ci\u22121) =\n\nmk\n\ni\u22121+\u03b1\ni\u22121+\u03b1\n\n\u03b1\n\nfor k \u2208 {cj : j < i}\n\nwhere mk = #{cj : cj = k \u2227 j < i}.\nThe GPUDPM consists of T individual DPMs, one per discrete time step t = 1, . . . , T , all tied\ntogether through a particular way of sharing the component parameters \u03c6t\nk and table occupancy\ncounts mt\nk between adjacent time steps (here t indexes the parameters and cluster sizes of the T\nDPMs).\nDependence among the mt\nwhen moving forward through time. Denote by mt = (mt\n\nk is introduced by perturbing the number of customers sitting at each table\nKt) the vector containing the\n\n1, . . . , mt\n\n2\n\n\fnumber of customers sitting at each table at time t before a \u201cdeletion\u201d step, where K t is the number\nof non-empty tables at time t. Similarly denote by mt+1 the same quantity after this deletion step.\nThen the perturbation of the class counts from one step to the next is governed by the process\n\n(cid:26)mt \u2212 \u03bet with probability \u03b3\n\nmt+1|mt, \u03c1 \u223c\n\nk=1 mt\n\nwhere \u03bet\n\nk \u223c Binomial(mt\n\nk, 1 \u2212 \u03c1) and \u03b6 t\n\nj for j 6= \u2018 and \u03b6 t\n\nDiscrete(mt/PKt\n\nmt \u2212 \u03b6t with probability 1 \u2212 \u03b3\nj = mt\n\n(3)\nj = 0 for j = \u2018 where \u2018 \u223c\nk). Before seating the customers arriving at time step t + 1, the number of\ncustomers sitting at each table is initialized to mt+1. This perturbation process can either remove\nsome number of customers from a table or effectively delete a table altogether. This deletion procedure\naccounts for the ability of the GPUDPM to model births and deaths of clusters.\nThe GPUDPM is also capable of modeling drifting cluster parameters. This drift is modeled by tying\n) from which the class\ntogether the component parameters \u03c6t\nparameter at time t is sampled given the class parameter at time t \u2212 1. For various technical reasons\nk are all drawn independently from G0,\none must ensure that the mixture component parameters \u03c6t\ni.e. {\u03c6t\nt=1 \u223c G0. This can be achieved by ensuring that G0 is the invariant distribution of the\ntransition kernel P(\u03c6t\n\nk through a transition kernel P(\u03c6t\n\nk|\u03c6t\u22121\n\nk}T\n\nk\n\nk|\u03c6t\u22121\n\nk\n\n) [8].\n\n3 Model\n\nk\n\nk|\u03c6t\u22121\n\nIn order to apply the GPUDPM model to spike sorting problems one \ufb01rst has to make a number of\nmodeling assumptions. First is choosing a form for the likelihood function describing the distribution\nof action potential waveform shapes generated by a single neuron P (xt|ct = k, \u03b8t\nk) (the distibution\nk) above), the prior over the parameters of that model (the base distribution\nof which was denoted F (\u03b8t\nG0 above), and the transition kernel P(\u03c6t\n) that governs how the waveshape of the action\npotentials emitted by a neuron can change over time. In the following we describe modeling choices\nwe made for the spike sorting task, as well as how the continuous spike occurrence times can be\nincorporated into the model to allow for correct treatment of neuron behaviour during the absolute\nrefractory period.\nLet {xt}T\nt=1 be the the set of action potential waveforms extracted from an extracellular recording\n(referred to as \u201cspikes\u201d in the following), and let \u03c4 1, . . . , \u03c4 T be the time stamps (in ms) associated\nwith these spikes in ascending order (i.e. \u03c4 t \u2265 \u03c4 t0\nif t > t0). The model thus incorporates two\ndifferent concepts of time: the discrete sequence of time steps t = 1, . . . , T corresponding to the\ntime steps in the GPUDPM model and the actual spike times \u03c4 t at which the spike xt occurs in the\nrecording. We assume that only one spike occurs per time step t, i.e. we set N = 1 in the model\nabove and identify ct = (ct\nIt is well known that the distribution of action potential waveforms originating from a single neuron\nin a PCA feature space is well approximated by a Normal distribution [1]. We choose to model each\ndimension xd (d \u2208 {1, . . . , D}) of the data independently with a univariate Normal distribution and\nuse a product of independent Normal-Gamma priors as the base distribution G0 of the DP.\n\n1) = ct.\n\nP (x|\u03c6) def= N (x|\u03c6) =\n\nG0(\u00b50, n0, a, b) def=\n\nDY\n\nN(cid:0)xd|\u00b5d, \u03bb\u22121\n\n(cid:1)\n\nDY\n\nd\n\nd=1\n\n(cid:2)N(cid:0)\u00b5d|\u00b50,d, (n0\u03bbd)\u22121(cid:1) Ga (\u03bbd|a, b)(cid:3)\n\n(4)\n\n(5)\n\nd=1\n\nwhere \u03c6 = (\u03bb1, . . . , \u03bbD, \u00b51, . . . , \u00b5D), and \u00b50 = (\u00b50,1, . . . , \u00b50,D), n0, a, and b are parameters of\nthe model. The independence assumption is made here mainly to increase computational ef\ufb01ciency.\nA model where P (x|\u03c6) is a multivariate Gaussian with full covariance matrix is also possible, but\nmakes sampling from (7) computationally expensive. While correlations between the components\ncan be observed in neural recordings, they can at least partially be attributed to temporal waveform\nvariation.\nTo account for the fact that neurons have an absolute refractory period following each action potential\nduring which no further action potential can occur, we extend the GPUDPM by conditioning the model\n\n3\n\n\fon the spike occurrence times \u03c41, . . . , \u03c4T and modifying the conditional probability of assigning\na spike to a cluster given the other cluster labels and the spike occurrence times \u03c41, . . . , \u03c4t in the\nfollowing way:\n\nP(ct = k|mt, c1:t\u22121, \u03c4 1:t, \u03b1) \u221d\n\nif \u03c4 t \u2212 \u02c6\u03c4 t\nif \u03c4 t \u2212 \u02c6\u03c4 t\nif\u03c4 t \u2212 \u02c6\u03c4 t\n\nk \u2264 rabs\nk > rabs and k \u2208 {1, . . . , Kt\u22121}\nk > rabs and k = Kt\u22121 + 1\n\n(6)\n\n\uf8f1\uf8f2\uf8f30\n\nmt\nk\n\u03b1\n\nk is the spike time of the last spike assigned to cluster k before time step t, i.e. \u02c6\u03c4 t\n\nk = \u03c4 t0\nwhere \u02c6\u03c4 t\n,\nt0 = max{t00|t00 < t \u2227 ct00 = k}. Essentially, the conditional probability of assigning the spike at\ntime t to cluster k is zero if the difference of the occurrence time of this spike and the occurrence\ntime of the last spike associated with cluster k is smaller than the refractory period rabs. If the time\ndifference is larger than rabs then the usual CRP conditional probabilities are used. In terms of the\nChinese restaurant metaphor, this setup corresponds to a restaurant in which seating a customer at\na table removes that table as an option for new customers for some period of time. Note that this\nextension introduces additional dependencies among the indicator variables c1, . . . , cT .\nThe transition kernel P(\u03c6t\n) speci\ufb01es how the action potential waveshape can vary over time. To\nmeet the technical requirements of the GPUDPM and because its waveform drift modeling semantics\nare reasonable we use the update rule of the Metropolis algorithm [11] as the transition kernel\nP(\u03c6t\n\nk|\u03c6t\u22121\n\n), i.e. we set\n\nk\n\nk|\u03c6t\u22121\n\nk\n\n(cid:18)\n\nZ\n\nk)d\u03c60(cid:19)\nk) = min(cid:0)1, G0(\u03c6t\n\nk)A(\u03c60, \u03c6t\n\nk\n\nk\n\nP(\u03c6t\n\nk\n\n, \u03c6t\n\nk\n\n, \u03c6t\n\nk\n\n\u03b4\u03c6t\u22121\n\n(\u03c6t\nk)\n\n1 \u2212\n\nk) +\n\nS(\u03c60, \u03c6t\n\nk)A(\u03c6t\u22121\n\n) = S(\u03c6t\u22121\n\n, \u03c3I). This choice of P (\u03c6t\n\nk) is a (symmetric) proposal distribution and A(\u03c60, \u03c6t\n\nk|\u03c6t\u22121\nk)/G0(\u03c6t\u22121\nwhere S(\u03c60, \u03c6t\nWe choose an isotropic Gaussian centered at the old value as proposal distribution S(\u03c60, \u03c6t\nk) =\nN (\u03c6t\u22121\n) ensures that G0 is the invariant distribution of the transition\nkernel, while at the same time allowing us to control the amount of correlation between time steps\nthrough \u03c3. A transition kernel of this form allows the distribution of the action potential waveforms\nto vary slowly (if \u03c3 is chosen small) from one time step to the next both in mean waveform shape as\nwell as in variance. While small changes are preferred, larger changes are also possible if supported\nby the data.\nInference in this model is performed using the sequential Monte Carlo algorithm (particle \ufb01lter)\nde\ufb01ned in [7, 8].\n\nk|\u03c6t\u22121\n\n(7)\n\n)(cid:1).\n\nk\n\nk\n\n4 Experiments\n\n4.1 Methodology\n\nExperiments were performed on a subset of the publicly available1 data set described in [12, 13],\nwhich consists of simultaneous intracellular and extracellular recordings of cells in the hippocampus\nof anesthetized rats. Recordings from an extracellular tetrode and an intracellular electrode were\nmade simultaneously, such that the cell recorded on the intracellular electrode was also recorded\nextracellularly by a tetrode.\nAction potentials detected on the intracellular (IC) channel are an almost certain indicator that the\ncell being recorded spiked. Action potentials detected on the extracellular (EC) channels may include\nthe action potentials generated by the intracellularly recorded cell, but almost certainly include\nspiking activity from other cells as well. The intracellular recording therefore can be used to obtain\na ground truth labeling for the spikes originating from one neuron that can be used to evaluate the\nperformance of human sorters and automatic spike sorting algorithms that sort extracellular recordings\n[13]. However, by this method ground truth can only be determined for one of the neurons whose\nspikes are present in the extracellular recording, and this should be kept in mind when evaluating the\nperformance of spike sorting algorithms on such a data set. Neither the correct number of distinct\nneurons recorded from by the extracellular electrode nor the correct labeling for any spikes not\noriginating from the neuron recorded intracellularly can be determined by this methodology.\n\n1http://crcns.org/data-sets/hc/hc-1/\n\n4\n\n\fData set\n\nDPM\nFN\n\nFP\n\nRPV\n\nFP\n\nGPUDPM\n\nFN\n\nRPV\n\n1\n\n2\n\n4.90% 4.21%\nMAP\nAVG 5.11% 5.17%\nMAP\n0.94% 9.40%\nAVG 0.83% 12.48%\n\n4\n4\n1\n1\n\n4.71% 1.32%\n4.77% 1.68%\n0.85% 18.63%\n0.86% 18.81%\n\n0\n0\n0\n0\n\nTable 1: Performance of both algorithms on the two data sets: % false positives (FP), % false negatives\n(FN), # of refratory period violations (RPV). Results are shown for the MAP solution (MAP) and\naveraged over the posterior distribution (AVG).\n\nThe subset of that data set that was used for the experiments consisted of two recordings from different\nanimals (4 minutes each), recorded at 10 kHz. The data was bandpass \ufb01ltered (300Hz \u2013 3kHz), and\nspikes on the intracellular channel were detected as the local maxima of the \ufb01rst derivative of the\nsignal larger than a manually chosen threshold. Spikes on the extracellular channels were determined\nas the local minima exceeding 4 standard deviations in magnitude. Spike waveforms of length 1\nms were extracted from around each spike (4 samples before and 5 samples after the peak). The\npositions of the minima within the spike waveforms were aligned by upsampling, shifting and then\ndownsampling the waveforms. The extracellular spikes corresponding to action potentials from the\nidenti\ufb01ed neuron were determined as the spikes occurring within 0.1 ms of the IC spike.\nFor each spike the signals from the four tetrode channels were combined into a vector of length 40.\nEach dimensions was scaled by the maximal variance among all dimensions and PCA dimensionality\nreduction was performed on the scaled data sets (for each of the two recordings separately). The \ufb01rst\nthree principal components were used as input to our spike sorting algorithm. The \ufb01rst recording\n(data set 1) consists of 3187 spikes, 831 originate from the identi\ufb01ed neuron, while the second (data\nset 2) contains 3502 spikes, 553 of which were also detected on the IC channel. As shown in Figure\n1, there is a clearly visible change in waveform shape of the identi\ufb01ed neuron over time in data set\n1, while in data set 2 the waveform shapes remain roughly constant. Presumably this change in\nwaveform shape is due to the slow death of the cell as a result of the damage done to the cell by the\nintracellular recording procedure.\nThe parameters for the prior (\u00b50, n0, a, b) were chosen empirically and were \ufb01xed at \u00b50 = 0,\nn0 = 0.1, a = 4, b = 1 for all experiments. The parameters governing the deletion procedure were\nset to \u03c1 = 0.985 and \u03b3 = 1 \u2212 10\u22125, re\ufb02ecting the fact that we consider relative \ufb01ring rates of the\nneurons to stay roughly constant over time and neuron death a relatively rare process respectively.\nThe variance of the proposal distribution \u03c3 was \ufb01xed at 0.01, favoring small changes in the cluster\nparameters from one time step to the next. Experiments on both data sets were performed for\n\u03b1 \u2208 {0.01, 0.005, 0.001} and the model was found to be relatively sensitive to this parameter in our\nexperiments. The sequential Monte Carlo simulations were run using 1000 particles, and multinomial\nresampling was performed at each step.\nFor comparison, the same data set was also sorted using the DPM-based spike sorting algorithm\ndescribed in [6]2, which pools waveforms over time and thus does not make use of any information\nabout the occurrence times of the spikes. The algorithm performs Gibbs sampling in a DPM with\nGaussian likelihood and a conjugate Normal-Inverse-Wishart prior. A Gamma prior is placed on the\nDP concentration parameter \u03b1. The parameters of the priors the prior were set to \u00b50 = 0, \u03ba0 = 0.1,\n\u03bb0 = 0.1 \u00d7 I, a0 = 1 and b0 = 1. The Gibbs sampler was run for 6000 iterations, where the \ufb01rst\n1000 were discarded as burn-in.\n\n4.2 Results\n\nThe performance of both algorithms is shown in Table 1. The data labelings corresponding to these\nresults are illustrated in Figure 1. As expected, our algorithm outperforms the DPM-based algorithm\non data set 1, which includes waveform drift which the DPM cannot account for. As data set 2 does\nnot show waveform drift it can be adequately modeled without introducing time dependence. The\nDPM model which has the advantage of being signi\ufb01cantly less complex than the GPUDPM is able\n\n2Code publicly available from http://www.gatsby.ucl.ac.uk/\u02dcfwood/code.html\n\n5\n\n\f(a) Ground Truth\n\n(b) Ground Truth\n\n(c) DPM\n\n(d) DPM\n\n(e) GPUDPM\n\n(f) GPUDPM\n\nFigure 1: A comparison of DPM to GPUDPM spike sorting for two channels of tetrode data for\nwhich the ground truth labeling of one neuron is known. Each column shows subsampled results for\none data set. In all plots the vertical axis is time and the horizontal axes are the \ufb01rst two principal\ncomponents of the detected waveforms. The top row of graphs shows the ground truth labeling\nof both data sets where the action potentials known to have been generated by a single neuron are\nlabeled with x\u2019s. Other points in the top row of graphs may also correspond to action potentials but\nas we do not know the ground truth labeling for them we label them all with dots. The middle row\nshows the maximum a posteriori labeling of both data sets produced by a DP mixture model spike\nsorting algorithm which does not utilize the time at which waveforms were captured, nor does it\nmodel waveform shape change. The bottom row shows the maximum a posteriori labeling of both\ndata sets produced by our GPUDPM spike sorting algorithm which does model both the time at\nwhich the spikes occurred and the changing action potential waveshape. The left column shows that\nthe GPUDPM performs better than the DPM when the waveshape of the underlying neurons changes\nover time. The right column shows that the GPUDPM performs no worse than the DPM when the\nwaveshape of the underlying neurons stays constant.\n\n6\n\n\fto outperform our model on this data set. The inferior performance of the GPUDPM model on this\ndata set can also partly be be explained by the inference procedure used: For the GPUDPM model\ninference is performed by a particle \ufb01lter using a relatively small number of particles (1000), whereas\na large number of Gibbs sampler iterations (5000) are used to estimate the posterior for the DPM.\nWith a larger number of particles (or samples in the Gibbs sampler), one would expect both models\nto perform equally well, with possibly a slight advantage for the GPUDPM which can exploit the\ninformation contained in the refractory period violations. As dictated by the model, the GPUDPM\nalgorithm does not assign two spikes that are within the refractory period of each other to the same\ncluster, whereas the DPM does not incorporate this restriction, and therefore can produce labelings\ncontaining refractory period violations. Though only a relatively small number of such mistakes\nare made by the DPM algorithm, these effects are likely to become larger in longer and/or noisier\nrecordings, or when more neurons are present.\nFor some values of \u03b1 the GPUDPM algorithm produced different results, showing either a large\nnumber of false positives or a large number of false negatives. In the former case the algorithm\nincorrectly places the waveforms from the IC channel and the waveform of another neuron in one\ncluster, in the latter case the algorithm starts assigning the IC waveforms to a different cluster after\nsome point in time. This behavior is illustrated for data set 1 and \u03b1 = 0.01 in Figure 2, and can be\nexplained by shortcomings of the inference scheme: While in theory the algorithm should be able to\nmaintain multiple labeling hypotheses throughout the entire time span, the particle \ufb01lter approach \u2013\nespecially when the number of particles is small and no specialized resampling scheme (e.g. [14]) is\nused \u2013 in practice often only represents the posterior accurately for the last few time steps.\n\nFigure 2: An alternative \u201cinterpretation\u201d of the data from the left column of Fig. 1 given by the\nGPUDPM spike sorter. Here the labels assigned to both the the neuron with changing waveshape and\none of the neurons with stationary waveshape change approximately half-way through the recording.\nAlthough it is dif\ufb01cult to see because the data set must be signi\ufb01cantly downsampled for display\npurposes, there is a \u201cnoise event\u201d at the point in time where the labels switch. A feature of the DDP\nis that it assigns posterior mass to both of these alternative interpretations of the data. While for this\ndata set we know this labeling to be wrong because we know the ground truth, in other recordings\nsuch an \u201cinjection of noise\u201d could, for instance, signal a shift in electrode position requiring similar\nrapid births and deaths of clusters.\n\n5 Discussion\n\nWe have demonstrated that spike sorting using time-varying Dirichlet process mixtures in general,\nand more speci\ufb01cally our spike sorting specialization of the GPUDPM, produce promising results.\nWith such a spike sorting approach we, within a single model, are able to account for action potential\nwaveform drift, refractory period violations, and neuron appearance and disappearance from a\nrecording. Previously no single model addressed all of these simultaneously, requiring solutions in\nthe form of ad hoc combinations of strategies and algorithms that produces spike sorting results that\nwere potentially dif\ufb01cult to characterize. Our model-based approach makes it easy to explicitly state\nmodeling assumptions and produces results that are easy to characterize. Also, more complex or\napplication speci\ufb01c models of the interspike interval distribution and/or the data likelihood can easily\n\n7\n\n\fbe incorporated into the model. The performance of the model on real data suggests that a more\ncomplete characterization of its performance is warranted. Directions for further research include the\ndevelopment of a more ef\ufb01cient sequential inference scheme or a hybrid sequential/Gibbs sampler\nscheme that allows propagation of interspike interval information backwards in time. Parametric\nmodels for the interspike interval density for each neuron whose parameters are inferred from the\ndata, which can improve spike sorting results [15], can also be incorporated into the model. Finally,\npriors may be placed on some of the parameters in order to make make the algorithm more robust\nand easily applicable to new data.\n\nAcknowledgments\n\nThis work was supported by the Gatsby Charitable Foundation and the PASCAL Network of Excel-\nlence.\n\nReferences\n[1] M. S. Lewicki. A review of methods for spike sorting: the detection and classi\ufb01cation of neural action\n\npotentials. Network: Computation in Neural Systems, 9(4):53\u201378, 1998.\n\n[2] M. Sahani. Latent variable models for neural data analysis. PhD thesis, California Institute of Technology,\n\nPasadena, California, 1999.\n\n[3] D. P. Nguyen, L. M. Frank, and E. N. Brown. An application of reversible-jump Markov chain Monte\n\nCarlo to spike classi\ufb01cation of multi-unit extracellular recordings. Network, 14(1):61\u201382, 2003.\n\n[4] D. G\u00a8or\u00a8ur, C. R. Rasmussen, A. S. Tolias, F. Sinz, and N.K. Logothetis. Modeling spikes with mixtures of\n\nfactor analyzers. In Proceeding of the DAGM Symposium, pages 391\u2013398. Springer, 2004.\n\n[5] F. Wood, S. Goldwater, and M. J. Black. A non-parametric Bayesian approach to spike sorting.\n\nIn\nProceedings of the 28th Annual International Conference of the IEEE Engineering in Medicine and Biology\nSociety, pages 1165\u20131168, 2006.\n\n[6] F. Wood and M. J. Black. A nonparametric Bayesian alternative to spike sorting. Journal of Neuroscience\n\nMethods, 173:1\u201312, 2008.\n\n[7] F. Caron. Inf\u00b4erence Bay\u00b4esienne pour la d\u00b4etermination et la s\u00b4election de mod`eles stochastiques. PhD thesis,\n\n\u00b4Ecole Centrale de Lille and Universit\u00b4e des Sciences et Technologiques de Lille, Lille, France, 2006.\n\n[8] F. Caron, M. Davy, and A. Doucet. Generalized Polya urn for time-varying Dirichlet process mixtures.\nIn 23rd Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI\u20192007), Vancouver, Canada, July 2007,\n2007.\n\n[9] A. Bar-Hillel, A. Spiro, and E. Stark. Spike sorting: Bayesian clustering of non-stationary data. Journal of\n\nNeuroscience Methods, 157(2):303\u2013316, 2006.\n\n[10] G. Santhanam, M. D. Linderman, V. Gilja, A. Afshar, S. I. Ryu, T. H. Meng, and K. V. Shenoy. HermesB:\nA continuous neural recording system for freely behaving primates. IEEE Transactions on Biomedical\nEngineering, 54(11):2037\u20132050, 2007.\n\n[11] A. W. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equations of state\n\ncalculations by fast computing machines. Journal of Chemical Physics, 21:1087\u20131092, 1953.\n\n[12] D. A. Henze, Z. Borhegyi, J. Csicsvari, A. Mamiya, K. D. Harris, and G. Buzs\u00b4aki. Intracellular features\npredicted by extracellular recordings in the hippocampus in vivo. Journal of Neurophysiology, 84(1):390\u2013\n400, 2000.\n\n[13] K. D. Harris, D. A. Henze, J. Csicsvari, H. Hirase, and G. Buzs\u00b4aki. Accuracy of tetrode spike separation as\ndetermined by simultaneous intracellular and extracellular measurements. Journal of Neurophysiology,\n81(1):401\u2013414, 2000.\n\n[14] P. Fearnhead. Particle \ufb01lters for mixture models with an unknown number of components. Journal of\n\nStatistics and Computing, 14:11\u201321, 2004.\n\n[15] C. Pouzat, M. Delescluse, P. Viot, and J. Diebolt. Improved spike-sorting by modeling \ufb01ring statistics\nand burst-dependent spike amplitude attenuation: A Markov Chain Monte Carlo approach. Journal of\nNeurophysiology, 91(6):2910\u20132928, 2004.\n\n8\n\n\f", "award": [], "sourceid": 998, "authors": [{"given_name": "Jan", "family_name": "Gasthaus", "institution": null}, {"given_name": "Frank", "family_name": "Wood", "institution": null}, {"given_name": "Dilan", "family_name": "Gorur", "institution": null}, {"given_name": "Yee", "family_name": "Teh", "institution": null}]}