Estimating Poisson processes#

In the last section, we considered a few simple properties of Poisson processes, but in doing so, we implicitly made two crucial assumptions:

  1. We assumed the rate for the process, \(\lambda\) was known.

  2. We assumed \(\lambda\) was constant.

In this section, we consider how to deal with situations in which these assumptions are not valid.

Estimation: constant rate, one observation#

To begin, let’s consider what happens when we can assume that \(\lambda\) is constant but we need to estimate its value from data. A little more formally, given some data in the form of a count \(n\) in a time window of length \(T\), we want to figure out what our “best” guess is as to the \(\lambda\) that produced these data. And we’d like this procedure to work for any possible data we might be given. That is, we want some function (an estimator) \(\hat{\lambda}(n)\) that takes in \(n\) and returns an estimate of \(\lambda\).

Now, there are many, many possible ways to define such a function, and the “best” one will depend on what we want to do with it. At a conceptual level, you might think of it in terms of bets you could place on the estimation game: if some kinds of errors are more costly than others, you might want to shift your estimation procedure.

So to start, we’ll consider a very simple kind of estimator. We’ll just decide to choose the parameter \(\lambda\) that makes our observation \(n\) most likely (again, assuming that the data come from a Poisson process with constant rate). As we recall from the last section, this means that \(n\) is Poisson-distributed:

\[ p(n) = \frac{e^{-\lambda T} (\lambda T)^n}{n!} . \]

Now, for a given \(n\), we’d like to maximize this quantity by changing \(\lambda\). In other words, instead of viewing the right-hand side as a function of \(n\) (with \(\lambda\) fixed), as we have been, we fix \(n\) and view it as a function of \(\lambda\). Viewed this way, the right-hand side is called the likelihood, a quantity that shows up over and over in statistics.

So we want to maximize the likelihood as a function of \(\lambda\) In practice, we do a trick that often makes the math easier: we maximize the logarithm of the likelihood. Since \(\log\) is a function that is always increasing, maximizing \(\log p(n)\) gives the same result as maximizing \(p(n)\), so this won’t change the outcome. So writing \(\mathcal{L}\) for the log likelihood, we have

\[ \mathcal{L}(\lambda) = \log p(n) = -\lambda T + n \log (\lambda T) - \log n! . \]

To maximize this, we recall from calculus that we want to take its derivative and set that to 0:

\[ \frac{d\mathcal{L}}{d\lambda} = 0 = -T + \frac{n}{\lambda} \quad \Rightarrow \quad \hat{\lambda} = \frac{n}{T} \]

This is a pretty sensible result. Our best estimate of the rate is simply the number of events divided by the total time.

Estimation: constant rate, multiple observations#

Okay, so let’s consider what happens when we have multiple observations \(\lbrace n_i \rbrace\) for \(i = 1\ldots I\), each of which is observed in a time \(T_i\). Here, we need to consider not just the probability of one event, but of all the events put together. Thankfully, as we noted before, if the observation times don’t overlap, these events are statistically independent, and the probability of observing all our data is simply the product of all the probabilities for the individual events:

\[ p(\lbrace n_i \rbrace) = \prod_{i=1}^I \frac{e^{-\lambda T_i} (\lambda T_i)^{n_i}}{n_i!} \]

and the log likelihood is

\[ \mathcal{L} = -\lambda \sum_{i=1}^I T_i + \log \lambda \sum_{i=1}^I n_i + \sum_{i=1}^I \left(n_i \log T_i - \log n_i!\right) . \]

Now, once again taking the derivative and noting that the last term is constant with respect to \(\lambda\), we find that maximizing \(\mathcal{L}\) requires

\[ -\sum_{i} T_i + \frac{1}{\lambda} \sum_i n_i = 0 \quad \Rightarrow \quad \hat{\lambda} = \frac{\sum_i n_i}{\sum_i T_i} . \]

Here again, the result is perfectly sensible: we just add all the events and divide by the total time. And this works because the Poisson process treats non-overlapping sections of time as independent.

Estimation: variable rate, multiple observations#

This is where it starts to get fun. Up to this point, we’ve assumed that the rate is a constant, but in many situations of interest in neuroscience, event rates are decidedly not constant. And when we consider rates that vary in time, we have to start making changes to our formulas. For example, if we have a rate function \(\lambda(t)\) for \(t \in [0, T]\), then the counts over the entire interval are still Poisson distributed,

\[ p(n) = \frac{e^{-\int_0^T \lambda(t) dt}(\int_0^T \lambda(t) dt)^n}{n!} \]

but we’ve had to replace \(\lambda T \rightarrow \int_0^T \lambda(t) dt\), which is another way of saying we have to sum up the total intensity over the interval to get a prediction for the counts.

Unfortunately, when it comes to estimating \(\lambda(t)\), we can get an intuition from this expression that simply summing events won’t be good enough: two rate functions with the same integral will produce identical Poisson distributions, so we won’t be able to tell them apart from count numbers alone.

So instead, we’ll pull the same trick we keep using in the case of the Poisson process. We’ll chop the interval \([0, T]\) into a lot of little intervals \(\Delta t = T/N\), and we’ll assume that in each of these intervals, the rate \(\lambda(t)\) can be approximated as constant. That is, if we consider discrete bins with edges (abusing notation) \(t\Delta t\), \(t = 1\ldots N\), then we have \(N\) rates \(\lambda_t\) to estimate.

Clearly, we can perform this estimate using the methods above, provided we make one additional assumption: the rate function \(\lambda(t)\) needs to be the same for every observation. That is, the time course of the intensity function needs to be repeated multiple times so that when we combine observations, we are estimating the same \(\lambda_t\). More explicitly, if \(t\) continues to index time and \(i\) indexes repeats, our estimator is

(1)#\[\hat{\lambda}_t = \frac{\sum_i n_{it}}{\sum_i \Delta t} = \frac{\sum_i n_{it}}{I \Delta t} .\]

A few important remarks on this formula:

  1. As before, the estimator has a simple form: add event counts, divide by total observation time. But here, we are being careful not to combine counts across time bins with different rates.

  2. This new estimator depends on the bin size \(\Delta t\), which is something of a free parameter. As we will see in the next section, the choice of this parameter not only reflects our assumptions about time-varying event rates, it controls tradeoffs in our ability to accurately estimate events rates versus changes in those rates.

  3. Again, the assumption that \(\lambda(t)\) has the same time course across repeats \(i\) is critical to this result. If each repeat (think trial) has a distinct time course \(\lambda_i(t)\), our estimator above is only estimating the mean rate across repeats.

Estimation: variable rate, single observation#

This is the challenge case. Clearly, even if we only have one observation, we can still use the estimator (1), but you may also suspect that when count numbers are low (perhaps only 0 or 1), this may be a very bad estimator. And you would be correct. In general, estimating single-trial event rates is a challenging research problem and generally involves making one or more additional assumptions about the data. For example, we might assume:

  • Event rates are controlled by some other set of known variables (think stimuli) in a parametric way (e.g., [Pillow et al., 2008]). Then adjusting for these parameters gives us a way to combine multiple observations in our estimates.

  • The event rates \(\lambda(t)\) are smooth (with some known smoothness; e.g. [Caruso et al., 2018]). The smoothness assumption effectively lets us use nearby time bins to improve our estimates.

  • We are observing multiple event streams (think neurons) whose event rates \(\lambda_j(t)\) are correlated (e.g., [Pandarinath et al., 2018]). Here, the correlated multiple streams function sort of like multiple observations.

Coda:#

Much of the chapter so far has focused on theory. In the next sections, we’ll turn to practicalities of how to use the theory we’ve established here to estimate event rates from real data.