Notes, assignments, and code for NEUROBIO 735 (Spring 2024).
1/11 – 4/17:
Tuesday, Thursday
3:00 – 4:30
301 Bryan Research Building
In our last week on programming, we’ll pivot toward the rest of the semester’s material by talking about fitting computational models. Perhaps surprisingly, the actual mechanics behind model fitting are rarely discussed, and so we’ll focus a little on the theory of fitting and the rest of our time on the actual mechanics of writing code that does this.
So before our first class:
minimize
function.In the most generic case, finding the minimum of an arbitrary function \(f(x)\) is computationally intractable: there is no guarantee we can find the global minimum. However, in many cases of interest, special properties of \(f(x)\) allow us to make use of symmetries that allow for efficient solution of the problem. Much of the research on optimization involves either proving that some class of optimization problems can be reduced to another with nice mathematical properties (so optimization can be performed efficiently) or designing new optimization algorithms that solve some class of problems faster (in fewer steps or less time per step).
For our second class this week, we’ll focus on simply using SciPy’s facilities for optimization without getting into details of the algorithms involved. That is, we’ll treat these as “black box” methods that take in an function \(f\) and spit out an answer.
So, before our second class:
In our fourth week, we’re going to pivot from our tour of data analysis to a set of skills that will become increasingly important as you investigate topics in quantitative neurobiology. Each week so far has featured a programming theme along with a data type:
These are all useful and applicable to even simple cases of running small data analyses, but for more complex work in modeling, we expect that code will grow and with it the complexity of writing and maintaining larger software projects. To begin to tackle that complexity, this week will focus on debugging.
All beginning programming students are either taught or stumble upon two methods of debugging:
Pepper your code with print statements like print('got to line 17')
also known as “debugging by print
.”
Aimlessly changing things until the code sort of works. Call it “debugging by guesswork” or less charitably, “debugging by random walk.”
To these, the last few years have made available another advanced technique: debugging by posting on StackOverflow.
These techniques can be highly effective. No kidding. But as you do a better job of refactoring code into functions and profiling to remove bottlenecks, code can become more difficult to explore interactively without doing major surgery. That is why almost every serious program language makes available a debugger.
Unfortunately, most advice about debugging tends to be language- and tool-specific. That can be helpful when you’re first starting, since your primary method of learning will be generalizing from your own experience with particular bugs, and knowing how to use a good tool at that point will result in dramatic improvements. But as you do more programming, you begin to see general patterns and gain experience with tracking down subtle bugs. The first set of readings this week deal with Python-specific material, the second with more general debugging strategy.
So before our first class:
and before our second class:
Read this and this for debugging strategies and tips.
Finally, for those of you debugging from a code editor VS Code has excellent support for debugging Python as well as Jupyter notebooks. (Note: there is a lot of information in that debugging link. I suggest skimming first to get a sense of what’s useful and focus on breakpoints and execution instead of fine-tuning configuration.)
For week three, we’ll be working with imaging data. Two-photon imaging, to be precise, though other types of imaging present similar challenges. Unlike point process (i.e., spike) data, which are just collections of events — temporal data — imaging data are spatiotemporal: we must deal not only with time, but space as well. In practice, this means not only working with time series, but with images: time series of images.
On the coding side, we’ll be devoting some time this week to learning what makes code run fast, why some programming languages are faster than others, the strengths and weaknesses of Python as a coding platform when we need lots of computation, and the tools Python provides for helping us find and remove speed bottlenecks in our code.
Much of what we will learn can be summarized in the classic quote by computer scientist Donald Knuth:
“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.”
So before our second class:
This week in class, we’ll be considering one of the most common data formats in all of science: tabular data. Data tables are the organizing principle behind spreadsheets, databases, and even advanced data visualization. Tabular data are typically organized with observations in rows and measured variables in columns, with the freedom to mix numbers, text, and dates in the same data structure.
In Python, these data are supported by the Pandas package through its Series
and DataFrame
classes. These data structures combine the efficiency and speed of NumPy arrays with special row and column index information that can be used to more easily group, subset, and reorganize heterogeneous data. The typical Pandas use case is for mixtures of categorical, string, and numerical data, as is often the case for behavioral and clinical applications.
So before our second meeting:
We’re going to start the analysis section of the course by working with a ubiquitous data type in neurobiology: point process or event data. Typically, these events are action potentials, but they could also be EPSPs, vesicle releases, or even communication in social networks. We’ll focus on common exploratory data approaches, including the peri-stimulus/peri-event time histogram and estimation of firing rates.
We’ll also get practice refactoring, the process of iteratively rewriting code to make it cleaner, more useful, and more maintainable. As a result, much of this week’s in-class work will have you writing and rewriting the same analyses, each time making them more generic and reusable. Along the way, we’ll touch on related issues such as code smells, testing, and the red-green-refactor method.
So before our first class:
and before our second class:
Optional reading:
Before we get going on our first week of class, let’s talk about how we’re going to do this:
The goal of the course is to give you a sampler of techniques and ideas in quantitative neurobiology, which we consider to encompass computation, data analysis, modeling, and theory. The course is divided into three main sections (with a final week for project presentations):
Introduction to programming (1/11 – 2/1): There are several good options here, but we’ll be using Python. More on details below, but we will not assume prior programming experience.
Analyzing neural data (systems) (2/6 – 3/7): The goal here is to get you comfortable using programming to explore, visualize, analyze, and model several types of data generated by neuroscience experiments.
Analyzing neural data (cellular/molecular) (3/19 – 4/16): Here, we’ll use R to analyze data from cellular and molecular neuro experiments.
Class will be in person in 301 Bryan.
This year, we will encourage you to use Google Colab for your assignments. This has the advantage of standardizing our Python environment and allowing you to easily share assignments for grading. If you prefer to use a local machine, that’s fine; please just discuss with us in advance, since we can’t necessarily provide support to debug your setup. Colab uses a variant of Jupyter Notebooks, which are covered in the Python Data Science Handbook, but we will also cover the basics of navigating this in our first class.
We will have both in-class work and assignments. You are allowed to work collaboratively on homework, but your write-ups must be done independently. Please also note everyone you worked with when turning in your assignments.
Solutions should be submitted as saved Jupyter Notebooks or R Markdown. We’ll tell you where to put these and how to name them.
Solutions are due before class on Tuesdays (3pm EST). Given how difficult a period this is, we can work with you if something unexpected occurs, but we need to know in advance. Please help us help you. We can’t release solutions until we have everyone’s assignments turned in.
For the first several weeks of class, we’ll be offering a crash course in basic Python covering A Whirlwind Tour of Python, transitioning to the Python Data Science Handbook. This will be basic and focused on students who have limited programming background. This is purely optional. Those of you who are already comfortable with programming do not need to attend, though you will be responsible for the material. I will also be working with the TAs to set up additional help during this period for those who would like it.
This second phase of the course will cover five weeks and will focus on analyzing real neuroscience data sets.
I will not be lecturing. At least, not much. Most of what we’ll cover isn’t really learned effectively that way, so we’ll use our class time to complete programming and data analysis exercises that build on the basic Python knowledge you gained by reading A Whirlwind Tour of Python.
Each week, we’ll do two sessions of in-class assignments, for which you’ll be encouraged to work with a partner. The weeks are organized around both data and programming themes, and the in-class assignments often build on one another. After class is done for the day, we’ll post links to solutions. Typically, we’ll be walking you through an example analysis, with the goal of setting you up for the homework.
You are responsible for reading through the Python Data Science Handbook. I will try to (roughly) have assignments keep pace with the material in the book, but this will be loose.
You are also responsible for checking this website. All class materials will be posted here, as well as changes and corrections to homework assignments.
Please make use of the TAs and their office hours. I am also glad to help. If something is confusing with the assignments, the fault is probably mine, and you’re probably not alone. If you alert me early, we can probably fix it.
This will change as we go along, but in order to help you get started, here’s our tentative plan:
Date | Topic | Exercises | Reading |
---|---|---|---|
1/11 | Housekeeping, accessing computing, advanced Googling | WWTP Ch. 1 – 6 | |
1/16 | What can Python do? | NMA tutorial 1 NMA tutorial 2 | WWTP Ch. 7 – 8 |
1/18 | Data structures, iteration | notebook | WWTP Ch. 9 – 12 |
1/23 | Patterns, functions, duck typing | notebook | PDSH Ch. 1 |
1/25 | NumPy and arrays | notebook | PDSH Ch. 2 |
1/30 | Data frames | notebook | PDSH Ch. 3 |
2/1 | Plotting | Seaborn tutorials Matplotlib tutorials | PDSH Ch. 4 |