In our second week, we’ll be working with a ubiquitous data type in neurobiology: point process or event data. Typically, these events are action potentials, but they could also be EPSPs, vesicle releases, or even communication in social networks. We’ll focus on common exploratory data approaches, including the peri-stimulus/peri-event time histogram, and estimation of firing rates.
We’ll also be talking about refactoring, the process of iteratively rewriting code to make it cleaner, more useful, and more maintainable. As a result, much of this week’s in-class work will have you writing and rewriting the same analyses, each time making them more generic and reusable. Along the way, we’ll touch on related issues such as code smells, testing, and the red-green-refactor method.
So before our first class:
and before our second class:
Following up on our discussion in class today:
When using a categorical variable in your regression (e.g., genotype, drug/no-drug), if you fit an intercept (and this is the default), then you are actually fitting redundant parameters. For example, in a model with genotype (WT, KO) and drug (Y, N) as parameters, then
y ~ g * d(with
ddrug, and an interaction term) are
In both cases, however, there are four and only four unique numbers. They are all just linear combinations of one another. The first case is known as treatment coding, while the second case goes by the name of sum coding. Depending on your use case, one or the other is usually more interpretable.
For Matlab, the default is treatment (dummy) coding. Relevant docs are:
One of the issues that came up this week in our discussion of tidy data is whether there’s ever a case when having multiple observed numbers on each row of a table could be considered tidy. The example is Table 12 in the tidy data paper, in which both tmin and tmax are recorded in each row.
The answer is that, apart from a rigorous mathematical definition, there is some wiggle room in deciding whether a data set is tidy. For instance, consider an eye tracking data set with eye position locations at every point in time
Note that we have two observations, the x and y coordinates, in each row. By the strictest definition, the data are not tidy. However, there is never a case when we would measure one of these datums without the other. The two of them could also be considered a single multivariate observation.
In general, I personally favor keeping multiple observations in the same column when
That’s my take. Your mileage may vary. Once you know the rule, you’re free to break the rule.
In our first week of class, we’ll be exploring Matlab’s capabilities for working with one of the most common data formats in all of science: tabular data. Data tables are the organizing principle behind spreadsheets, databases, and even advanced data visualization. Tabular data are typically organized with observations in rows and measured variables in columns, with the freedom to mix numbers, text, and dates in the same data structure.
Traditionally, Matlab has lacked facilities for dealing with tables, but in the last several years, it has introduced both categorical variables and data tables, making analysis of mixed-type data sets like surveys and behavioral data much more tractable.
So before our first class:
and before our second meeting:
First of all, don’t panic. This quiz is supposed to be challenging. My goal is to figure out how much basic programming you’re familiar with in Matlab so we can calibrate the course properly. In principle, the material here should be doable for anyone who’s completed something like this online class.
You can download the data for all the questions here. Data for each question are in the corresponding
.mat file, which you can load into Matlab. This is a self-assessment. You do not need to mail your answers to me, but feel free to contact me to ask questions if you’re stuck. Use whatever resources you want to complete the assignment; programming is not about having things memorized.
So here goes:
Write a function that accepts as input a probability
p and an integer
n. The function should return a vector of
n random binary numbers (either 0 or 1). Each entry should be 1 with probability
Given the data matrix
D, with a separate data series in each row, write a function that returns a matrix
Z of the same dimensions. Each row of
Z contains the same data as
D, but z-scored (i.e., mean 0, standard deviation 1).
Given the matrix
A, make a new matrix
B consisting of all rows in
A that contain no negative elements.
Given the two-column matrix
A (first column, observations; second column, time points), plot the raw data (as points) and a solid line showing the smoothed data using any method you choose.
Sand returns a cell array equal to the number of fields of each struct in
S. Each element of the returned cell array should collect all the values from the corresponding field in
S. For example, if the elements of
Sare points with
yfields, the returned value would be a cell array with two cells, one containing a vector of all the
xvalues, the other all
For the last several years, particularly with the advent of Project Jupyter, documents integrating code with text, equations, and visualization (plots) have been a hot topic in scientific computing. While it is possible to run Matlab through a Jupyter notebook, Matlab has its own answer in the form of Matlab markup. It’s similar in spirit to the better-known Markdown, which is just a shorthand way of writing text that can easily be transformed to HMTL. This HTML can then be inserted directly into webpages. In fact, a previous incarnation of this Matlab course was generated using this method.
I encourage you to get familiar with Matlab Markup prior to the semester, since it may be useful in preparing your homeworks. The ability to mix text with code and visualization allows us to get closer to the idea of an analysis lab notebook, a practice we’ll encourage you to continue outside the class.
For those of you running Matlab 2016a or later, you may also want to take advantage of Live Scripts.