Notes, assignments, and code for NEUROBIO 735 (Spring 2018).
1/10 – 2/8:
Wednesday, Thursday
3:00 – 4:30
DIBS conference room
Let’s start by loading this week’s sample data set, which you can retrieve from GitHub These data are from this study of social decision-making. Monkeys played a modified dictator game in which one, the “actor,” was confronted on each trial with one of two choices:
The data contain spike counts for several time periods during the trial, as well as the trial type (some trials were “cued” (i.e., forced choice)) and which animals were involved.
In this lesson, we’ll focus on exploring the data using Matlab, including some tidying, summary statistics, and plotting.
The most common format for storing tabular data is in comma-separated
value or .csv
format, in which each line of the file contains a row of
the table, with columns separated by commas. Often, the first line of the
file lists the column names, also separated by commas.
Load the data into Matlab. This can be done via the readtable
function.
Print the first 5 rows of the table. What are the column names? What is a way to find this out without printing the table directly (i.e., what if you had to find this out within a function)?
One of the downsides of text formats like .csv
is that files don’t specify the type of data in each column. Programs that read such data typically use heuristics to make an educated guess. In particular, programs need some policy for deciding whether string data (e.g., “chicken”) merely represent text or are categorical data (e.g., the column can only be “chicken,” “duck,” or “turkey”). When loading in tabular data, one often has to write boilerplate code to reinterpret some variables as categorical, others as dates, etc.
Which columns in the dataset should be categorical? What syntax would you use to replace this column with a categorical version of itself?
Write code that allows you to convert an arbitrary list of columns to categorical:
colname
is a variable containing a column name and tbl
a table, tbl.(colname)
selects the column. This also works for structs.)In some cases, tabular data may be read in as text when it should be coded as numeric.
What type is the cued
column? What should it be?
Write code that converts the cued
column to an appropriate type.
Many data tables are structured with multiple observations in each row, but for many analysis and visualization needs, data are better structured as one observation per row or “tidy” data. While not perfect for all purposes, tidy data provides a method for canonicalizing data so that software can focus on transferring to and from a fixed form.
In Matlab, the key functions for this purpose are stack
and unstack
. For a more full-featured approach in R, see tidyr.
What changes must be made to put this dataset in tidy format?
Write code that performs this conversion:
Having data in tidy format can facilitate easier comparisons across observation types. In Matlab, the relevant function for calculating statistics across categorical groups is grpstats
.
grpstats
to calculate the mean and variance of spike counts in each epoch.For comparing data across categories, the typical visualization is a box plot or violin plot. The former is available in Matlab, while the latter requires using third-party code.