# FAQS

## Q. Does `saspt`

provide a way to do tracking?

`saspt`

only analyzes the *output* of a tracking algorithm; it doesn’t
produce the trajectories themselves. The general workflow is:

Acquire some raw SPT movies

Use a tracking algorithm to produce trajectories from the SPT movie

Feed the trajectories into

`saspt`

(or whatever your favorite analysis tool is)

There are lots of good tracking algorithms out there.
In the sample data included in the `saspt`

repo, we used an in-house tracking
tool with a graphic user interface (quot). But - you know - we’re biased.

Other popular options are TrackMate, the multiple-target tracing algorithm, or Sbalzerini’s hill climbing algorithm. There are new tracking algorithms every day; use Google to see for yourself.

## Q. Why doesn’t `saspt`

support input format *X*?

Because a table of detections is probably the simplest format that exists to describe trajectories, so we added it first. We’re happy to expand support for additional formats (within reason) - let us know with a GitHub request.

## Q. Why are the default diffusion coefficients log-spaced?

As the diffusion coefficient increases, our estimate of it becomes much more error-prone. This makes it difficult for humans to compare the occupations of states with widely varying diffusion coefficients. By plotting on a log scale, we minimize these perceptual differences so that humans can accurately compare states across the full range of biologically observed diffusion coefficients.

To demonstrate this effect, consider the likelihood function for the jumps of a 2D Brownian motion with no localization error, diffusion coefficient \(D\), frame interval \(\Delta t\), and \(n\) total jumps. The maximum likelihood estimator for the diffusion coefficient is the mean-squared displacement (MSD):

where \((\Delta x_{j}, \Delta y_{j})\) is the \(j^{\text{th}}\) jump in the trajectory.

We can get the “best-case” error in this estimate using the Cramer-Rao lower bound (CRLB), which provides the minimum variance our estimator \(\hat{D}\):

So the error in the estimate of \(D\) actually increases as the
*square* of \(D\). If we throw in localization error (represented
as the 1D spatial measurement variance, \(\sigma^{2}\)) and neglect the
off-diagonal terms in the covariance matrix, we get the approximation

Notice that this makes it even *harder* to estimate the diffusion coefficient,
especially when \(D \Delta t < \sigma^{2}\).

## Q. How does `saspt`

estimate the posterior occupations, given the posterior distribution?

`saspt`

always uses the posterior mean. If \(\boldsymbol{\alpha}\) is the parameter to the posterior Dirichlet distribution over state occupations, then the posterior mean \(\boldsymbol{\tau}\) is simply the normalized Dirichlet parameter:

We prefer the posterior mean to max *a posteriori* (MAP) or other estimators because
it is very conservative and minimizes the occurrence of spurious features.

## Q. I want to measure the fraction of particles in a particular state. How do I do that?

If you know the range of diffusion coefficients you’re interested in, you can directly integrate the mean posterior occupations. Say we want the fraction of particles with diffusion coefficients between 1 and 10 \(\mu\text{m}^{2}\)/sec:

```
>>> occupations = SA.posterior_occs_dataframe
>>> in_range = (occupations['diff_coef'] >= 1.0) & (occupations['diff_coef'] < 10.0)
>>> print(occupations.loc[in_range, 'mean_posterior_occupation'].sum())
```

That being said, `saspt`

does *not* provide any way to determine the
endpoints for this range, and that is up to you or the methods you develop.

## Q. What is defocalization?

Examining the movie in the section Background, you may notice that the particles are constantly wandering into and out of focus. The faster they move, the faster they escape the microscope’s focus.

As it turns out, this behavior (termed “defocalization”) has a dangerous side effect. If we
want to know the *state occupations* - the fraction of proteins in a particular state -
we may be tempted to report the *fraction of observed trajectories in that state*.
The problem is that particles in fast
states contribute many short trajectories, because they can wander in and out of focus
multiple times before bleaching. By contrast, particles in slow states produce a
few long trajectories; they don’t move fast enough to reenter the focal volume before bleaching.

As a result, a mixture of equal parts fast and slow particles does *not* produce
equal parts fast and slow trajectories.

Defocalization is the reason why the “MSD histogram” method - one of the most popular approaches to analyze protein tracking data - yields inaccurate results when applied to 2D imaging. A more detailed discussion can be found in the papers of Mazza and Hansen and Woringer.

`saspt`

avoids the state estimation problem by computing state occupations in terms of *jumps*
rather than trajectories. Additionally, an analytical correction factor (analogous to the empirical
correction factor from Hansen and Woringer) can be
applied to the data by passing the `focal_depth`

parameter when constructing a `StateArray`

or `StateArrayDataset`

object.