GLMdenoise: a fast, automated technique for denoising task-based fMRI data
Kendrick Kay, Ariel Rokem, Jon Winawer, Bob Dougherty, Brian Wandell
Questions, comments? E-mail Kendrick

News

History of major code changes

Introduction

GLMdenoise is a technique for denoising task-based fMRI data. The basic idea is to derive noise regressors from voxels unrelated to the experimental paradigm and to use these regressors in a general linear model (GLM) analysis of the data. The technique does not require external physiological monitoring nor does it require extra fMRI data—thus, it can be retrospectively applied to existing datasets. The technique is described in the following paper:

Kay, K.N., Rokem, A., Winawer, J., Dougherty, R.F., & Wandell, B.A. GLMdenoise: a fast, automated technique for denoising task-based fMRI data. Frontiers in Neuroscience (2013).

The technique is fast and automated and general: it requires only a design matrix indicating the experimental design and an fMRI dataset. The technique does make a few assumptions, however. First, it assumes that voxels unrelated to the experimental paradigm are contained in the dataset (e.g. voxels in other brain areas, white matter, etc.). Second, the technique assumes that there are at least two runs available and that experimental conditions are repeated across runs. This requirement is necessary because GLMdenoise involves cross-validation across runs.

The amount of improvement in signal-to-noise ratio (SNR) provided by GLMdenoise depends on a variety of factors, such as the intrinsic SNR level, the amount of data collected, and the experimental design. We suspect that SNR gains will tend to be more substantial for low-SNR regimes such as event-related experiments where stimulus durations are brief and evoked fMRI responses are small.

We provide MATLAB code implementing GLMdenoise. The code consists of standard (uncompiled) MATLAB functions, takes advantage of MATLAB's built-in multithreading, and requires only the Statistics Toolbox. The code has been tested on MATLAB 7.6 and 7.9, and should probably work with any recent version of MATLAB.

If you use GLMdenoise in your research, please cite the above paper.

Getting started

Get code from github
After downloading and unzipping the files, launch MATLAB and change to the directory containing the files. You can then run the example scripts by typing "example1", "example2", etc. More information on the example scripts is provided below.

Download example dataset (82 MB)
This dataset is required to run the example scripts. Place the dataset inside the code directory. Alternatively, the example scripts will automatically download the dataset.

Join the GLMdenoise discussion group
This mailing list is for anyone interested in using the GLMdenoise toolbox. Announcements regarding GLMdenoise will be made to this mailing list.

Browse code in HTML (version 1.4)

Basics of the code

The input and output format of the code is intentionally generic. We would be happy to help integrate the code with existing analysis packages.

Inputs. The code requires the following inputs:

Note that the fMRI data should already be pre-processed. We recommend that at a minimum, slice time correction and motion correction should be performed and the first few volumes of each run should be discarded to allow magnetization to reach steady-state. We do not recommend detrending or high-pass filtering the time-series; this is because polynomial regressors are included in the GLM and it is more accurate to model low-frequency fluctuations as part of the GLM. Spatial smoothing and/or atlas normalization are fine as pre-processing steps.

Hint: Make sure that you have discarded the first few volumes of each run (to avoid initial magnetization effects) and that your data matrices do not have any NaNs. Also, make sure that your data has not been mean-subtracted (or converted to percent BOLD change). For example, if you view a slice of your data, your data should look like a brain (values should largely be positive).

Code execution. The most basic call to the code is:

[results,denoiseddata] = GLMdenoisedata(design,data,stimdur,tr,[],[],[],'figures');

This command writes diagnostic figures to the directory 'figures' and returns outputs in 'results' and 'denoiseddata'.

Outputs. The code produces the following outputs:

For additional details on the code outputs, please see the documentation in GLMdenoisedata.m.

Example scripts

Example dataset. The example scripts make use of an example dataset. The example dataset consists of fMRI measurements in human visual cortex at 3T using gradient-echo, isotropic 2.5-mm voxels, and a TR (sampling rate) of approximately 1.3 s (1.337702 s). The dataset has a dimensionality of 64 voxels x 64 voxels x 4 slices. (The original dataset contains 22 slices, but we have extracted slices 9–12 to make the dataset smaller in size.) The data have been pre-processed, including discarding the first few volumes of each run, slice time correction, motion correction, and spatial undistortion based on fieldmap measurements. Voxels for which a complete set of data is not available (e.g. due to head motion) have been set such that their time-series consist of all 0s. The following variables are contained in the dataset file:

Example scripts. The following links are to HTML pages automatically produced by the MATLAB publish command.

Frequently asked questions (FAQ)

Other notes

Error-checking user inputs. There are currently no safeguards on what happens if the user provides inputs in the wrong format.

Singular matrices. In some cases (e.g., a finite impulse response model with many free parameters and low amounts of data), the design matrix may be singular. The code intentionally crashes if a singular design matrix is encountered.

Reoptimize the noise pool. It is possible that voxels in the noise pool (that have cross-validated R2 less than 0%) will, after denoising, have cross-validation R2 greater than 0%. In theory, one might suppose that a new noise pool can be constructed (based on the new identification of what voxels are signal and what voxels are noise), new noise regressors can be derived, new models can be fit, etc., and the whole process can be iterated until convergence. This is an interesting idea, but it is doubtful that this would result in any substantial changes/improvements. The reason is that the proportion of voxels that escape the noise pool is probably very low in any given experiment and their time-series data are dominated by noise anyway; thus, removing those voxels from the noise pool will probably not substantially change the noise regressors that are derived.

Error estimation. To obtain error bars on beta weights, GLMdenoise uses a bootstrapping procedure (i.e. draw runs with replacement from the available set of runs, estimate parameters, and repeat this process a large number of times). Since this process is stochastic, error estimates on beta weights may vary a bit, and in particular, may vary across conditions. If, in subsequent analyses, you wish to convert each voxel's beta weights into t-units, instead of dividing each beta weight by its standard error, you may wish to first calculate the average standard error across conditions (for each voxel) and divide all beta weights (associated with that voxel) by this value. This, of course, presumes that the noise is roughly constant across conditions, but it does have the benefit that random fluctuations in error estimates across conditions will not affect the final values obtained for a voxel. GLMdenoise also provides an option for obtaining parametric GLM fits and associated error estimates.

Multiple sessions. If you are attempting to analyze functional data from different scan sessions, it may be useful to apply GLMdenoise to each session individually and then pool the results across sessions. The reason is that signal and noise characteristics may vary substantially from session to session.

Known issues and caveats

Memory usage. The code consists of large-scale matrix operations applied to many voxels simultaneously. This means that the code is fast but memory-intensive. Currently, there are no internal checks on the amount of memory used by the code. Thus, there is potential for the code to hit the maximum memory available on your machine, thereby causing swapping (and unreasonably slow performance). Memory usage is particularly dependent on the number of bootstraps performed (so you may wish to keep the number of bootstraps low, or set the number of bootstraps to zero). To reduce memory usage, you may also want to set opt.denoisespec to {}, make sure your data is prepare in single format, and/or crop your fMRI volumes. The easiest solution may be to buy more RAM!

Unbalanced designs. A presumption of GLMdenoise is that in any given dataset, there are multiple runs and conditions repeat across runs. In addition, there is some presumption that runs are more or less "similar". To understand what this last condition means, let us consider an extreme case. Suppose that some of the runs in a dataset are extremely short (e.g. 30 time points) whereas the remainder of the runs are long (e.g. 300 time points). In this case, we expect some degree of inconsistency with regards to the appropriate amount of denoising to perform across the two types of runs. That is, when training the model on the long runs, the number of PCs that result in optimal performance will probably be on high side, whereas when training the model on the short runs, the number of PCs that result in optimal performance will probably be on the low side (i.e. overfitting is likely).

Unbalanced designs and bootstrapping. Unbalanced designs can interact in certain ways with the bootstrapping procedure. For example, suppose that you have two run types such that a given set of conditions occur in one run type and a non-overlapping set of conditions occur in the other run type. In this case, the bootstrap procedure used at the end of the GLMdenoise algorithm may encounter a bootstrap sample that (by chance) does not include any occurrence of a given condition (for example, suppose that all draws in a bootstrap sample were obtained from runs of the first run type). If the code encounters a case where there are no occurrences of a given condition, the code will simply assign a zero weight for those cases. A solution for the issue of unbalanced designs and bootstrapping is to make appropriate use of the opt.bootgroups variable (that is, you can ensure that the bootstrap sample is balanced with respect to the two types of runs).

Unbalanced designs and cross-validation. The issue of unbalanced designs can also crop up in the leave-one-run-out cross-validation procedure used in GLMdenoise. For example, suppose all occurrences of a certain set of conditions occur in just one of the runs in a dataset. Then, when that run is left out in the cross-validation procedure, then no occurrences of those conditions are available to estimate the beta weights corresponding to those conditions. When this is encountered, the code will simply assign a zero weight for those conditions, thereby resolving the problem (although the situation is a bit strange since the prediction for the left-out run is always a time-series consisting of 0s and so denoising will never appear to be useful for that cross-validation iteration).

Bootstrap smaller chunks of data. The strategy for estimating error bars on model parameters is to bootstrap runs (i.e., draw with replacement from the original set of runs). With small numbers of runs, this strategy might result in sub-optimal estimates. One potential strategy is to split runs into small chunks (e.g. 1-min chunks) and then bootstrap these chunks. This strategy may produce better bootstrap results, and still largely respects the fact that the noise is not independent from time point to time point.