Notes, assignments, and code for NEUROBIO 735 (Spring 2024).
1/11 – 4/17:
Tuesday, Thursday
3:00 – 4:30
301 Bryan Research Building
For this homework, we’ll explore another tabular dataset involving animal choice behavior from this paper. In the experiment, male monkeys were given a choice between delivery of two juice rewards, one of which was paired with an image. The images were chosen from one of four categories — dominant male monkeys, subordinate male monkeys, images of female monkey sex skin, and a gray square — while the relative amount of juice available for each option varied across choices. When animals were indifferent between the juice and juice-plus-image options for a given discrepancy in juice values, that difference was taken to be the value of the paired image. At issue in the experiment is how image values differed by category and across individuals.
In this homework, we will replicate pieces of this analysis by fitting choice models to subsets of the data. In doing so, we’ll have recourse to the split-apply-combine pattern, which allows us to first define an analysis on an arbitrary subset of data and then apply this analysis across some grouping of the data set. The key advantage of this method is that it separates the analysis itself from the method used to split the data into groups, resulting in more flexible, reusable code.
The data are contained in this csv file with columns:
monk
: animal doing the tasksession
: session number (unique across all animals)piccat
: picture category:
dv
: difference in value between the juice-plus-image and juice-only options; positive values mean more fluid for choosing the juice-plus-image optionNimg
: number of times the animal chose the juice-plus-image optionNtot
: total number of times the choice was presentedsessdate
: date of the session in which these data were collectedOur goal will be to design an analysis capable of calculating the value of each image category each session. We’ll do this by using split-apply-combine.
For many applications of the split-apply-combine method, it’s easiest to begin by designing the “apply” portion first. That is, we design and test the analysis we want to perform on an arbitrary subset of the data. To do this, we’ll often extract a representative subset of the data in order to perform testing.
Extract a subset of the data to use in developing the analysis. For instance, session 9, picture category 4.
Remove any rows in the data subset that correspond to no observations (Ntot
= 0).
The most important tests we tend to do in scientific software involve plotting data and results. Because your eye is a much better pattern detector than most programmatic tests you might write, the fastest way to assess whether your code is functioning correctly is to generate plots.
For each value difference in your data subset, calculate the proportion of trials on which the animal chose the juice-plus-image option.
Plot this proportion as a set of points.
We expect that as dv
increases, animals are more likely to choose the juice-plus-image option. Our observations in this case are numbers of times each choice was presented and numbers of times the image option was selected. For such observations, the standard statistical distribution is a binomial, and the typical model is logistic regression. In logistic regression, we assume
\[
\log \frac{p}{1-p} = X \beta
\]
where \(p\) is the probability of choosing the image, \(X\) is the matrix of regressors in our model (the variables we think might affect this probability), and \(\beta\) is a vector of coefficients (weights) for these variables. In our case, we will assume the simplest possible model, in which the probability of choosing the juice-plus-image option depends only on the difference in juice between the two options:
\[
\log \frac{p}{1-p} = \beta_1 + \beta_2 dv
\]
Our measure of interest will be the value of \(dv\) at which the animal is indifferent between the juice-only and juice-plus-image options, the so-called “point of subjective equality.” At this point, we can reason that \(V_1 = V_2 + V_{\mathrm{image}} \Rightarrow V_{\mathrm{image}} = V_1 - V_2 = -dv\). That is, the image value is minus the value of \(dv\) at which \(p=\frac{1}{2}\).
Derive an expression for the image value as a function of \(\beta\).
StatsModels
’s GLM
class, fit the logistic regression. (Note: The inputs for the Binomial
family regressions, including the logistic, assume two columns for the dependent variables, the number of successes and the number of failures.)sm.add_constant()
in the link above!For model checking, we want to be sure that our fitted model captures any trends visible in the data. We can do this by plotting a model fit line on the same figure as our raw data.
Given a GLMResults
object, we can use the get_prediction
method to generate this fit line. We pass get_prediction
a NumPy array of new values at which to predict the model’s output.
Generate a table of new dv
values. These should be numerous and close enough together that the predicted values will plot as a smooth line.
Use the get_prediction
command to get predictions for \(p\) at the new dv
values. Alternately, you can use the formula above to solve for \(p\) in terms of \(\beta\) and \(dv\).
Plot these predictions on the same figure as the original data.
Now that we know the analysis we want to perform on each data subset, we should be able to follow split-apply-combine:
groupby
, which breaks the data into groups. Ideally, we would give this a list of grouping variables using the by
keyword argument.apply
on the resulting GroupBy
object. apply
takes as its first argument a function used to evaluate each chunk of the DataFrame
.groupby
and apply
along with your analysis function to calculate the value of each image category in each session.