Exploring table data

Let’s start by loading this week’s sample data set, which you can retrieve from GitHub These data are from this study of social decision-making. Monkeys played a modified dictator game in which one, the “actor,” was confronted on each trial with one of two choices:

between a juice reward for himself and the other monkey (BOTH) or for himself alone (SELF)
between a juice reward for the other monkey (OTHER) or no one (NONE)

The data contain spike counts for several time periods during the trial, as well as the trial type (some trials were “cued” (i.e., forced choice)) and which animals were involved.

In this lesson, we’ll focus on exploring the data using Matlab, including some tidying, summary statistics, and plotting.

Loading table data

The most common format for storing tabular data is in comma-separated value or .csv format, in which each line of the file contains a row of the table, with columns separated by commas. Often, the first line of the file lists the column names, also separated by commas.

Load the data into Matlab. This can be done via the readtable function.
Print the first 5 rows of the table. What are the column names? What is a way to find this out without printing the table directly (i.e., what if you had to find this out within a function)?

Reformat data

Categorical

One of the downsides of text formats like .csv is that files don’t specify the type of data in each column. Programs that read such data typically use heuristics to make an educated guess. In particular, programs need some policy for deciding whether string data (e.g., “chicken”) merely represent text or are categorical data (e.g., the column can only be “chicken,” “duck,” or “turkey”). When loading in tabular data, one often has to write boilerplate code to reinterpret some variables as categorical, others as dates, etc.

Which columns in the dataset should be categorical? What syntax would you use to replace this column with a categorical version of itself?
Write code that allows you to convert an arbitrary list of columns to categorical:
1. How would you specify a list of columns? What data structure would you use?
2. How would you generalize your code from the first part to transform a single column if you were given that column in a variable? (Hint: If colname is a variable containing a column name and tbl a table, tbl.(colname) selects the column. This also works for structs.)
3. How would you repeat the process for every column in the list?

Text to numeric

In some cases, tabular data may be read in as text when it should be coded as numeric.

What type is the cued column? What should it be?
Write code that converts the cued column to an appropriate type.
1. How would you convert a single value in this column?
2. How would you apply this to every value in the column?

Tidying data

Many data tables are structured with multiple observations in each row, but for many analysis and visualization needs, data are better structured as one observation per row or “tidy” data. While not perfect for all purposes, tidy data provides a method for canonicalizing data so that software can focus on transferring to and from a fixed form.

In Matlab, the key functions for this purpose are stack and unstack. For a more full-featured approach in R, see tidyr.

What changes must be made to put this dataset in tidy format?
Write code that performs this conversion:
- Make sure that any new columns you introduce are named appropriately. This can be specified as extra arguments to the necessary functions.
- It may be tempting to write out all the columns that need to be manipulated by hand, but work toward a solution that would work if you had 20 or 50 such columns. (Hint: do the column names form a pattern? How would you get the names of all columns that follow this pattern?)

Group-level statistics

Having data in tidy format can facilitate easier comparisons across observation types. In Matlab, the relevant function for calculating statistics across categorical groups is grpstats.

Use grpstats to calculate the mean and variance of spike counts in each epoch.

Exploratory plotting

For comparing data across categories, the typical visualization is a box plot or violin plot. The former is available in Matlab, while the latter requires using third-party code.

Create a box plot of spike count versus epoch. What do you learn about these data? What could be problematic about comparing means of these distributions, as we did earlier?

Quantitative Neurobiology

Class details:

Exercises

Quizzes

GitHub