Exploring Image data#
Image data in neurobiology can be broadly divided into two types:
Static images, including histology, gene expression maps, and other 2 and 3-d images. These represent data at a single time point.
Dynamic images, including calcium and fMRI imaging. These represent data as a sequence of images, with a collection of images at each time.
Loading the data#
This week’s data are available for download here (right click and “save as”), or can be downloaded from Google Drive via
!gdown 10mbjgJIkyaczKs47_f4es0M3AI7gqTy3
on Colab or in the notebook. The data are recordings of calcium imaging experiments in mouse V1 that come to us courtesy of Ashley Wilson in Lindsey Glickfeld’s lab. The data are once again in .mat
format. This week, they are large by the standards of what we’ve been working with (~200MB), though far smaller than such datasets are for real experiments. In the actual experiments, images are taken at 30 Hz (30 images per second), whereas the sample data are downsampled to 3 Hz.
In the experiment that generated these data, the mice were exposed to a series of drifting grating stimuli, and recorded responses were images reflecting calcium fluorescence levels. The stimuli consisted of drifting gratings at a variety of orientations, probing the sensitivity of cells in V1 to both orientation and motion direction.
The variables in the data file are:
data
: This week, it will be up to you to determine what dimensions correspond to what variables.dirTuningExp
: A structure containing metadata about the experiment. In particular:nTrials
: Number of trials in the data.stimOffFrames
: Number of frames during which no stimulus was presented.stimOnFrames
: Number of frames during which each stimulus was presented.tGratingDirectionDeg
: Direction of drift of the moving grating on each trial.
Each trial (and thus the dataset) began with the stimulus off and then switched it on.
Exercise
Load the data. What is its shape?
Based on principles of memory layout we’ve discussed, which dimension of the array should be time, if we’re mostly interested in performing analysis on the individual images?
Plot sections of the data (as images) to determine which dimension of the
data
array is time.
Solution:#
Show code cell content
# Load data
import scipy.io as sci
vars = sci.loadmat('DirectionTuning_V1_dec.mat')
data = vars['data']
dirTuningExp = vars['dirTuningExp']
## shape of the data
Nvert, Nhorz, Nframes = data.shape
print(f"Data are {Nvert} x {Nhorz} x {Nframes}")
Data are 125 x 350 x 1440
If we’re using C memory layout (“row-major order,” with consecutive data in rows), then index 0 should be time, since that is the index that advances slowest. In Matlab or Julia (with FORTRAN or column-major order), time would need to be the last index, since that’s the one that changes slowest. We want the slowest changing index to be time because all points at the same time are then consecutive in memory, and grabbing them for further manipulation and calculation will be faster.
Show code cell content
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ['svg']
plt.imshow(data[:, :, 1], aspect='auto')
plt.colorbar()
plt.show()
plt.imshow(data[:, :, 5], aspect='auto')
plt.colorbar()
plt.show()
Converting from images to time series#
In addition to a sequence of images, we can also think of these data as a collection of time series, one per pixel.
Exercise
Extract the calcium time series for a few representative pixels. Plot them. Be sure your x-axis reflects the actual time between images/samples.
Solution:#
Show code cell content
# let's look at a time series
import numpy as np
## take (55, 125) and (75, 180) as examples
dt = 1/3 # 3 Hz sampling
taxis = np.arange(0, Nframes) * dt
plt.plot(taxis, data[55, 125, :])
plt.plot(taxis, data[75, 180, :])
plt.xlabel('Time [s]')
plt.ylabel('Pixel Intensity')
plt.show()
Converting from arrays to movies#
For spatiotemporal data, one of the best ways to gain qualitative insight is by using your eye as a pattern detector. For a sequence of images, Python, like many languages, will allow us to play them as a movie. There’s a simple example here (Colab-specific here) and more complex tutorials here and here. I cannot overemphasize the power of creating animations and movies of your data.
Exercise
Using the examples above, make a movie of the data.
Make sure the colors in your plot are appropriately normalized. Different image functions have different expectations about the range of values in the data you feed them, but it might help, for example, to make sure that the values in each pixel are between 0 and 1 across all images. More specifically, make sure you are normalizing across images, not just within images.
Warning
Making a movie using all the data can take a long time. I suggest using a smaller number of frames (50-100) for a more reasonable runtime.
Tip
If you want to convert the animation object to HTML5 video, as in the Colab solution, but you aren’t working in Colab, you may need to install ffmpeg
. I suggest
!conda install -c conda-forge ffmpeg
if you are using Anaconda.
Solution:#
Show code cell content
# show a movie
import matplotlib.animation as anim
from IPython.display import HTML
## normalizer
pmax = np.max(data)
pmin = np.min(data)
fig = plt.figure();
vid = []
for i in range(50):
vid_temp = plt.imshow((data[:, :, i] - pmin) / (pmax - pmin), animated = True, aspect = 'auto')
vid.append([vid_temp])
plt.close() # so we don't display the plot itself
vid = anim.ArtistAnimation(fig, vid, interval = dt * 1000)
HTML(vid.to_html5_video())
## to save the video, uncomment this:
# vid.save('basic_animation.mp4', fps=int(1/dt))
Tuning curves: a first pass#
For each recorded neuron in the movie, we might like to assess its sensitivity to both orientation and motion direction. To do this, we first need to find the locations of cells within the image (and they might move), then appropriately average the calcium fluorescence time series, then finally assess whether a stimulus is tuned and to what degree.
But for programming, we should start simple, with the most straightforward version we can think of: let’s try to assess the tuning of each pixel and do so with a back-of-the-envelope sort of calculation that we can refine as we go.
Exercise
Let’s start with a fixed point in the image (e.g., index (45, 199)). Plot the calcium fluorescence time series for that point.
For the stimulus-off baseline and each orientation, find the mean calcium activation. There are lots of ways to do this. Plot the tuning curve as a function of motion direction. Make sure to label the x-axis appropriately and indicate the baseline activation level.
Find the orientation for which activation is maximal. Do this programmmatically, since we’ll want to automate this for each pixel later.
Solution:#
Show code cell content
# let's plot the time series for one pixel
pt_in_img = [45, 199]
tseries = data[pt_in_img[0], pt_in_img[1], :]
plt.plot(np.arange(Nframes) * dt, tseries)
plt.xlabel('time (s)');
Show code cell content
# ...and figure out its tuning
# Every orientation is off for 12, on for 6
# strategy: make a grouped variable for each frame, then get a group mean for each condition
## first clean data
labels = -1 * np.ones([Nframes, 1])
codes = dirTuningExp['tGratingDirectionDeg'][0][0][0]
Ntrials = dirTuningExp['nTrials'][0][0][0][0]
Noff = dirTuningExp['stimOffFrames'][0][0][0][0]
Non = dirTuningExp['stimOnFrames'][0][0][0][0]
offset = 0
for i in range(Ntrials):
offset = offset + Noff
labels[offset + np.array(range(Non))] = codes[i]
offset = offset + Non
## then convert the data into table structure
import pandas as pd
dat = {'orientation': labels.ravel(), 'trace': tseries}
dat = pd.DataFrame(data = dat)
## convert orientation to categorical type
dat['orientation'] = dat['orientation'].astype('category')
## compute average
tuning = dat.groupby(by = 'orientation').mean()
print(tuning)
# plot the tuning curve
tuning_toplot = tuning.iloc[1:]
tuning_toplot.plot()
xlim = plt.xlim()
plt.plot(xlim, [tuning.iloc[0], tuning.iloc[0]], 'r')
plt.legend(('trace','baseline'))
trace
orientation
-1.0 29577.029297
0.0 29163.691406
45.0 30960.468750
90.0 29730.621094
135.0 29330.035156
180.0 30097.125000
225.0 29211.435547
270.0 29040.560547
315.0 34516.226562
/tmp/ipykernel_3473/3308523778.py:19: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
import pandas as pd
/tmp/ipykernel_3473/3308523778.py:28: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
tuning = dat.groupby(by = 'orientation').mean()
<matplotlib.legend.Legend at 0x7f33fcc2f2f0>
Show code cell content
# find maximally tuned orientation
# a really brittle way to do tuning is to find the max absolute response relative to baseline
# which equals to find the absolute average value
# where does the tuning curve equal its max?
# this can return more than one value!
temp = tuning_toplot.loc[tuning_toplot['trace'] == np.max(tuning['trace']), :]
# get these values
orientations = tuning_toplot.index.values
# print the first one
preferred = temp.index.values[0]
print(f"Maximal tuning at {preferred} degrees.")
Maximal tuning at 315.0 degrees.