function results = fitnonlinearmodel(opt,chunksize,chunknum) <opt> is a struct with the following fields (or a .mat file with 'opt'): *** OUTPUT DIRECTORY *** <outputdir> (optional) is the directory to save results to *** STIMULUS *** <stimulus> is: (1) a matrix with time points x components (2) a cell vector of (1) indicating different runs (3) a function that returns (1) or (2) *** DATA *** <data> is: (1) a matrix with time points x voxels (2) a cell vector of (1) indicating different runs (3) a function that returns (1) or (2) (4) a function that accepts a vector of voxel indices and returns (1) or (2) corresponding to those voxels. in this case, <vxs> must be supplied. <vxs> (optional) is: (1) a vector of all voxel indices that you wish to analyze. (If you use the chunking mechanism (<chunksize>, <chunknum>), then a subset of these voxels are analyzed in any given function call.) Note that we automatically sort the voxel indices and ensure uniqueness. (2) a .mat file with 'vxs' as (1) this input matters only if <data> is of case (4). *** MODEL *** <model> is: {X Y Z W} where X is the initial seed (1 x P). Y are the bounds (2 x P). NaNs in the first row indicate parameters to fix. Z is a function that accepts two arguments, parameters (1 x P) and stimuli (N x C), and outputs predicted responses (N x 1). W (optional) is a function that transforms stimuli into a new form prior to model evaluation. OR {M1 M2 M3 ...} where M1 is of the form {X Y Z W} described above, and the remaining Mi are of the form {F G H I} where F is a function that takes fitted parameters (1 x P) from the previous model and outputs an initial seed (1 x Pnew) for the current model G are the bounds (2 x Pnew). NaNs in the first row indicate parameters to fix. H is a function that takes fitted parameters (1 x P) from the previous model and outputs a function that accepts two arguments, parameters (1 x Pnew) and stimuli (N x C), and outputs predicted responses (N x 1). I (optional) is a function that takes fitted parameters (1 x P) from the previous model and outputs a function that transforms stimuli into a new form prior to model evaluation. OR M where M is a function that takes stimuli (N x C) and responses (N x 1) and outputs an estimate of the linear weights (1 x C). For example, simple OLS regression is the case where M is @(X,y) (inv(X'*X)*X'*y)'. This case is referred to as the linear-model case. *** SEED *** <seed> (optional) is: (1) the initial seed (1 x P) (2) several initial seeds to try (Q x P) in order to find the one that produces the least error (3) a function that accepts a single voxel index and returns (1) or (2). in this case, <vxs> must be supplied. If supplied, <seed> overrides the contents of X in <model>. In the linear-model case, <seed> is not applicable and should be []. *** OPTIMIZATION OPTIONS *** <optimoptions> (optional) are optimization options in the form used by optimset.m. Can also be a cell vector with option/value pairs, in which these are applied after the default optimization options. The default options are: optimset('Display','iter','FunValCheck','on', ... 'MaxFunEvals',Inf,'MaxIter',Inf, ... 'TolFun',1e-6,'TolX',1e-6, ... 'OutputFcn',@(a,b,c) outputfcnsanitycheck(a,b,c,1e-6,10)) In particular, it may be useful to specify a specific optimization algorithm to use. In the linear-model case, <optimoptions> is ignored. <outputfcn> (optional) is a function suitable for use as an 'OutputFcn'. If you supply <outputfcn>, it will take precedence over any 'OutputFcn' in <optimoptions>. The reason for <outputfcn> is that the data points being fitted will be passed as a fourth argument to <outputfcn> (if <outputfcn> accepts four arguments). This enables some useful functionality such as being able to visualize the model and the data during the optimization. In the linear-model case, <outputfcn> is ignored. *** RESAMPLING SCHEMES *** <wantresampleruns> (optional) is whether to resample at the level of runs (as opposed to the level of individual data points). If only one run of data is supplied, the default is 0 (resample data points). If more than one run of data is supplied, the default is 1 (resample runs). <resampling> (optional) is: 0 means fit fully (no bootstrapping nor cross-validation) B or {B SEED GROUP} indicates to perform B bootstraps, using SEED as the random number seed, and GROUP as the grouping to use. GROUP should be a vector of positive integers. For example, [1 1 1 2 2 2] means to draw six bootstrap samples in total, with three bootstrap samples from the first three cases and three bootstrap samples from the second three cases. If SEED is not provided, the default is sum(100*clock). If GROUP is not provided, the default is ones(1,D) where D is the total number of runs or data points. V where V is a matrix of dimensions (cross-validation schemes) x (runs or data points). Each row indicates a distinct cross-validation scheme, where 1 indicates training, -1 indicates testing, and 0 indicates to not use. For example, [1 1 -1 -1 0] specifies a scheme where the first two runs (or data points) are used for training and the second two runs (or data points) are used for testing. Default: 0. *** METRIC *** <metric> (optional) determine how model performance is quantified. <metric> should be a function that accepts two column vectors (the first is the model; the second is the data) and outputs a number. Default: @calccorrelation. *** ADDITIONAL REGRESSORS *** <maxpolydeg> (optional) is a non-negative integer with the maximum polynomial degree to use for polynomial nuisance functions. The polynomial nuisance functions are constructed on a per-run basis. <maxpolydeg> can be a vector to indicate different degrees for different runs. A special case is NaN which means to omit polynomials. Default: NaN. <wantremovepoly> (optional) is whether to project the polynomials out from both the model and the data before computing <metric>. Default: 1. <extraregressors> (optional) is: (1) a matrix with time points x regressors (2) a cell vector of (1) indicating different runs (3) a function that returns (1) or (2) Note that a separate set of regressors must be supplied for each run. The number of regressors does not have to be the same across runs. <wantremoveextra> (optional) is whether to project the extraregressors out from both the model and the data before computing <metric>. Default: 1. *** OUTPUT-RELATED *** <dontsave> (optional) is a string or a cell vector of strings indicating outputs to omit when returning. For example, you may want to omit 'testdata', 'modelpred', 'modelfit', 'numiters', and 'resnorms' since they may use a lot of memory. If [] or not supplied, then we use the default of {'modelfit' 'numiters' 'resnorms'}. If {}, then we will return all outputs. Note: <dontsave> can also refer to auxiliary variables that are saved to the .mat files when <outputdir> is used. <dosave> (optional) is just like 'dontsave' except that the outputs specified here are guaranteed to be returned. (<dosave> takes precedence over <dontsave>.) Default is {}. <chunksize> (optional) is the number of voxels to process in a single function call. The default is to process all voxels. <chunknum> (optional) is the chunk number to process. Default: 1. This function, fitnonlinearmodel.m, is essentially a wrapper around MATLAB's lsqcurvefit.m function for the purposes of fitting nonlinear (and linear) models to data. This function provides the following key benefits: - Deals with input and output issues (making it easy to process many individual voxels and evaluate different models) - Deals with resampling (cross-validation and bootstrapping) - In the case of nonlinear models, makes it easy to evaluate multiple initial seeds (to avoid local minima) - In the case of nonlinear models, makes it easy to perform stepwise fitting of models Outputs: - 'params' is resampling cases x parameters x voxels. These are the estimated parameters from each resampling scheme for each voxel. - 'trainperformance' is resampling cases x voxels. This is the performance of the model on the training data under each resampling scheme for each voxel. - 'testperformance' is resampling cases x voxels. This is the performance of the model on the testing data under each resampling scheme for each voxel. - 'aggregatedtestperformance' is 1 x voxels. This is the performance of the model on the testing data, after aggregating the data and model predictions across the resampling schemes. - 'testdata' is time points x voxels. This is the aggregated testing data across the resampling schemes. - 'modelpred' is time points x voxels. This is the aggregated model predictions across the resampling schemes. - 'modelfit' is resampling cases x time points x voxels. This is the model fit for each resampling scheme. Here, by "model fit" we mean the fit for each of the original stimuli based on the parameters estimated in a given resampling case; we do not mean the fit for each of the stimuli involved in the fitting. (For example, if there are 100 stimuli and we are performing cross-validation, there will nevertheless be 100 time points in 'modelfit'.) Also, note that 'modelfit' is the raw fit; it is not adjusted for <wantremovepoly> and <wantremoveextra>. - 'numiters' is a cell vector of dimensions 1 x voxels. Each element is is resampling cases x seeds x models. These are the numbers of iterations used in the optimizations. Note that 'numiters' is [] in the linear-model case. - 'resnorms' is a cell vector of dimensions 1 x voxels. Each element is is resampling cases x seeds. These are the residual norms obtained in the optimizations. This is useful for diagnosing multiple-seed issues. Note that 'resnorms' is [] in the linear-model case. Notes: - Since we use %06d.mat to name output files, you should use no more than 999,999 chunks. - <chunksize> and <chunknum> can be strings (if so, they will be passed to str2double). - <stimulus> can actually have multiple frames in the third dimension. This is handled by making it such that the prediction for a given data point is calculated as the average of the predicted responses for the individual stimulus frames associated with that data point. - In the case of nonlinear models, to control the scale of the computations, in the optimization call we divide the data by its standard deviation and apply the exact same scaling to the model. This has the effect of controlling the scale of the residuals. This last-minute scaling should have no effect on the final parameter estimates. History: - 2014/05/01 - change the main loop to parfor; some cosmetic tweaks; now, if no parameters are to be optimized, just return the initial seed - 2013/10/02 - implement the linear-model case - 2013/09/07 - fix bug (if polynomials or extra regressors were used in multiple runs, then they were not getting fit properly). - 2013/09/07 - in fitnonlinearmodel_helper.m, convert to double in the call to lsqcurvefit; and perform a speed-up (don't compute modelfit if unwanted) - 2013/09/04 - add totalnumvxs variable - 2013/09/03 - allow <dontsave> to refer to the auxiliary variables - 2013/09/02 - add 'modelfit' and adjust default for 'dontsave'; add 'dosave' - 2013/08/28 - new outputs 'resnorms' and 'numiters'; last-minute data scaling; tweak default handling of 'dontsave' - 2013/08/18 - Initial version. Example 1: % first, a simple example x = randn(100,1); y = 2*x + 3 + randn(100,1); opt = struct( ... 'stimulus',[x ones(100,1)], ... 'data',y, ... 'model',{{[1 1] [-Inf -Inf; Inf Inf] @(pp,dd) dd*pp'}}); results = fitnonlinearmodel(opt); % now, try 100 bootstraps opt.resampling = 100; opt.optimoptions = {'Display' 'off'}; % turn off reporting results = fitnonlinearmodel(opt); % now, try leave-one-out cross-validation opt.resampling = -(2*(eye(100) - 0.5)); results = fitnonlinearmodel(opt); Example 2: % try a more complicated example. we use 'outputfcn' to % visualize the data and model during the optimization. x = (1:.1:10)'; y = evalgaussian1d([5 1 4 0],x); y = y + randn(size(y)); opt = struct( ... 'stimulus',x, ... 'data',y, ... 'model',{{[1 2 1 0] [repmat(-Inf,[1 4]); repmat(Inf,[1 4])] ... @(pp,dd) evalgaussian1d(pp,dd)}}, ... 'outputfcn',@(a,b,c,d) pause2(.1) | outputfcnsanitycheck(a,b,c,1e-6,10) | outputfcnplot(a,b,c,1,d)); results = fitnonlinearmodel(opt); Example 3: % same as the first example in Example 1, but now we use the % linear-model functionality x = randn(100,1); y = 2*x + 3 + randn(100,1); opt = struct( ... 'stimulus',[x ones(100,1)], ... 'data',y, ... 'model',@(X,y) (inv(X'*X)*X'*y)'); results = fitnonlinearmodel(opt);
0001 function results = fitnonlinearmodel(opt,chunksize,chunknum) 0002 0003 % function results = fitnonlinearmodel(opt,chunksize,chunknum) 0004 % 0005 % <opt> is a struct with the following fields (or a .mat file with 'opt'): 0006 % 0007 % *** OUTPUT DIRECTORY *** 0008 % <outputdir> (optional) is the directory to save results to 0009 % 0010 % *** STIMULUS *** 0011 % <stimulus> is: 0012 % (1) a matrix with time points x components 0013 % (2) a cell vector of (1) indicating different runs 0014 % (3) a function that returns (1) or (2) 0015 % 0016 % *** DATA *** 0017 % <data> is: 0018 % (1) a matrix with time points x voxels 0019 % (2) a cell vector of (1) indicating different runs 0020 % (3) a function that returns (1) or (2) 0021 % (4) a function that accepts a vector of voxel indices and returns (1) or (2) 0022 % corresponding to those voxels. in this case, <vxs> must be supplied. 0023 % <vxs> (optional) is: 0024 % (1) a vector of all voxel indices that you wish to analyze. (If you use 0025 % the chunking mechanism (<chunksize>, <chunknum>), then a subset of these 0026 % voxels are analyzed in any given function call.) Note that we automatically 0027 % sort the voxel indices and ensure uniqueness. 0028 % (2) a .mat file with 'vxs' as (1) 0029 % this input matters only if <data> is of case (4). 0030 % 0031 % *** MODEL *** 0032 % <model> is: 0033 % {X Y Z W} where 0034 % X is the initial seed (1 x P). 0035 % Y are the bounds (2 x P). NaNs in the first row indicate parameters to fix. 0036 % Z is a function that accepts two arguments, parameters (1 x P) and 0037 % stimuli (N x C), and outputs predicted responses (N x 1). 0038 % W (optional) is a function that transforms stimuli into a new form prior 0039 % to model evaluation. 0040 % OR 0041 % {M1 M2 M3 ...} where M1 is of the form {X Y Z W} described above, 0042 % and the remaining Mi are of the form {F G H I} where 0043 % F is a function that takes fitted parameters (1 x P) from the previous model 0044 % and outputs an initial seed (1 x Pnew) for the current model 0045 % G are the bounds (2 x Pnew). NaNs in the first row indicate parameters to fix. 0046 % H is a function that takes fitted parameters (1 x P) from the previous model 0047 % and outputs a function that accepts two arguments, parameters (1 x Pnew) and 0048 % stimuli (N x C), and outputs predicted responses (N x 1). 0049 % I (optional) is a function that takes fitted parameters (1 x P) from the 0050 % previous model and outputs a function that transforms stimuli into a 0051 % new form prior to model evaluation. 0052 % OR 0053 % M where M is a function that takes stimuli (N x C) and responses (N x 1) and 0054 % outputs an estimate of the linear weights (1 x C). For example, simple 0055 % OLS regression is the case where M is @(X,y) (inv(X'*X)*X'*y)'. 0056 % This case is referred to as the linear-model case. 0057 % 0058 % *** SEED *** 0059 % <seed> (optional) is: 0060 % (1) the initial seed (1 x P) 0061 % (2) several initial seeds to try (Q x P) in order to find the one that 0062 % produces the least error 0063 % (3) a function that accepts a single voxel index and returns (1) or (2). 0064 % in this case, <vxs> must be supplied. 0065 % If supplied, <seed> overrides the contents of X in <model>. 0066 % In the linear-model case, <seed> is not applicable and should be []. 0067 % 0068 % *** OPTIMIZATION OPTIONS *** 0069 % <optimoptions> (optional) are optimization options in the form used by optimset.m. 0070 % Can also be a cell vector with option/value pairs, in which these are applied 0071 % after the default optimization options. The default options are: 0072 % optimset('Display','iter','FunValCheck','on', ... 0073 % 'MaxFunEvals',Inf,'MaxIter',Inf, ... 0074 % 'TolFun',1e-6,'TolX',1e-6, ... 0075 % 'OutputFcn',@(a,b,c) outputfcnsanitycheck(a,b,c,1e-6,10)) 0076 % In particular, it may be useful to specify a specific optimization algorithm to use. 0077 % In the linear-model case, <optimoptions> is ignored. 0078 % <outputfcn> (optional) is a function suitable for use as an 'OutputFcn'. If you 0079 % supply <outputfcn>, it will take precedence over any 'OutputFcn' in <optimoptions>. 0080 % The reason for <outputfcn> is that the data points being fitted will be passed as a 0081 % fourth argument to <outputfcn> (if <outputfcn> accepts four arguments). This 0082 % enables some useful functionality such as being able to visualize the model and 0083 % the data during the optimization. 0084 % In the linear-model case, <outputfcn> is ignored. 0085 % 0086 % *** RESAMPLING SCHEMES *** 0087 % <wantresampleruns> (optional) is whether to resample at the level of runs (as opposed 0088 % to the level of individual data points). If only one run of data is supplied, the 0089 % default is 0 (resample data points). If more than one run of data is supplied, the 0090 % default is 1 (resample runs). 0091 % <resampling> (optional) is: 0092 % 0 means fit fully (no bootstrapping nor cross-validation) 0093 % B or {B SEED GROUP} indicates to perform B bootstraps, using SEED as the random 0094 % number seed, and GROUP as the grouping to use. GROUP should be a vector of 0095 % positive integers. For example, [1 1 1 2 2 2] means to draw six bootstrap 0096 % samples in total, with three bootstrap samples from the first three cases and 0097 % three bootstrap samples from the second three cases. If SEED is not provided, 0098 % the default is sum(100*clock). If GROUP is not provided, the default is ones(1,D) 0099 % where D is the total number of runs or data points. 0100 % V where V is a matrix of dimensions (cross-validation schemes) x (runs or data 0101 % points). Each row indicates a distinct cross-validation scheme, where 1 indicates 0102 % training, -1 indicates testing, and 0 indicates to not use. For example, 0103 % [1 1 -1 -1 0] specifies a scheme where the first two runs (or data points) are 0104 % used for training and the second two runs (or data points) are used for testing. 0105 % Default: 0. 0106 % 0107 % *** METRIC *** 0108 % <metric> (optional) determine how model performance is quantified. <metric> should 0109 % be a function that accepts two column vectors (the first is the model; the second 0110 % is the data) and outputs a number. Default: @calccorrelation. 0111 % 0112 % *** ADDITIONAL REGRESSORS *** 0113 % <maxpolydeg> (optional) is a non-negative integer with the maximum polynomial degree 0114 % to use for polynomial nuisance functions. The polynomial nuisance functions are 0115 % constructed on a per-run basis. <maxpolydeg> can be a vector to indicate different 0116 % degrees for different runs. A special case is NaN which means to omit polynomials. 0117 % Default: NaN. 0118 % <wantremovepoly> (optional) is whether to project the polynomials out from both the 0119 % model and the data before computing <metric>. Default: 1. 0120 % <extraregressors> (optional) is: 0121 % (1) a matrix with time points x regressors 0122 % (2) a cell vector of (1) indicating different runs 0123 % (3) a function that returns (1) or (2) 0124 % Note that a separate set of regressors must be supplied for each run. The number 0125 % of regressors does not have to be the same across runs. 0126 % <wantremoveextra> (optional) is whether to project the extraregressors out from 0127 % both the model and the data before computing <metric>. Default: 1. 0128 % 0129 % *** OUTPUT-RELATED *** 0130 % <dontsave> (optional) is a string or a cell vector of strings indicating outputs 0131 % to omit when returning. For example, you may want to omit 'testdata', 'modelpred', 0132 % 'modelfit', 'numiters', and 'resnorms' since they may use a lot of memory. 0133 % If [] or not supplied, then we use the default of {'modelfit' 'numiters' 'resnorms'}. 0134 % If {}, then we will return all outputs. Note: <dontsave> can also refer to 0135 % auxiliary variables that are saved to the .mat files when <outputdir> is used. 0136 % <dosave> (optional) is just like 'dontsave' except that the outputs specified here 0137 % are guaranteed to be returned. (<dosave> takes precedence over <dontsave>.) 0138 % Default is {}. 0139 % 0140 % <chunksize> (optional) is the number of voxels to process in a single function call. 0141 % The default is to process all voxels. 0142 % <chunknum> (optional) is the chunk number to process. Default: 1. 0143 % 0144 % This function, fitnonlinearmodel.m, is essentially a wrapper around MATLAB's 0145 % lsqcurvefit.m function for the purposes of fitting nonlinear (and linear) models 0146 % to data. 0147 % 0148 % This function provides the following key benefits: 0149 % - Deals with input and output issues (making it easy to process many individual 0150 % voxels and evaluate different models) 0151 % - Deals with resampling (cross-validation and bootstrapping) 0152 % - In the case of nonlinear models, makes it easy to evaluate multiple initial 0153 % seeds (to avoid local minima) 0154 % - In the case of nonlinear models, makes it easy to perform stepwise fitting of models 0155 % 0156 % Outputs: 0157 % - 'params' is resampling cases x parameters x voxels. 0158 % These are the estimated parameters from each resampling scheme for each voxel. 0159 % - 'trainperformance' is resampling cases x voxels. 0160 % This is the performance of the model on the training data under each resampling 0161 % scheme for each voxel. 0162 % - 'testperformance' is resampling cases x voxels. 0163 % This is the performance of the model on the testing data under each resampling 0164 % scheme for each voxel. 0165 % - 'aggregatedtestperformance' is 1 x voxels. 0166 % This is the performance of the model on the testing data, after aggregating 0167 % the data and model predictions across the resampling schemes. 0168 % - 'testdata' is time points x voxels. 0169 % This is the aggregated testing data across the resampling schemes. 0170 % - 'modelpred' is time points x voxels. 0171 % This is the aggregated model predictions across the resampling schemes. 0172 % - 'modelfit' is resampling cases x time points x voxels. 0173 % This is the model fit for each resampling scheme. Here, by "model fit" 0174 % we mean the fit for each of the original stimuli based on the parameters 0175 % estimated in a given resampling case; we do not mean the fit for each of the 0176 % stimuli involved in the fitting. (For example, if there are 100 stimuli and 0177 % we are performing cross-validation, there will nevertheless be 100 time points 0178 % in 'modelfit'.) Also, note that 'modelfit' is the raw fit; it is not adjusted 0179 % for <wantremovepoly> and <wantremoveextra>. 0180 % - 'numiters' is a cell vector of dimensions 1 x voxels. Each element is 0181 % is resampling cases x seeds x models. These are the numbers of iterations 0182 % used in the optimizations. Note that 'numiters' is [] in the linear-model case. 0183 % - 'resnorms' is a cell vector of dimensions 1 x voxels. Each element is 0184 % is resampling cases x seeds. These are the residual norms obtained 0185 % in the optimizations. This is useful for diagnosing multiple-seed issues. 0186 % Note that 'resnorms' is [] in the linear-model case. 0187 % 0188 % Notes: 0189 % - Since we use %06d.mat to name output files, you should use no more than 999,999 chunks. 0190 % - <chunksize> and <chunknum> can be strings (if so, they will be passed to str2double). 0191 % - <stimulus> can actually have multiple frames in the third dimension. This is handled 0192 % by making it such that the prediction for a given data point is calculated as the 0193 % average of the predicted responses for the individual stimulus frames associated with 0194 % that data point. 0195 % - In the case of nonlinear models, to control the scale of the computations, in the 0196 % optimization call we divide the data by its standard deviation and apply the exact 0197 % same scaling to the model. This has the effect of controlling the scale of the 0198 % residuals. This last-minute scaling should have no effect on the final parameter estimates. 0199 % 0200 % History: 0201 % - 2014/05/01 - change the main loop to parfor; some cosmetic tweaks; 0202 % now, if no parameters are to be optimized, just return the initial seed 0203 % - 2013/10/02 - implement the linear-model case 0204 % - 2013/09/07 - fix bug (if polynomials or extra regressors were used in multiple runs, 0205 % then they were not getting fit properly). 0206 % - 2013/09/07 - in fitnonlinearmodel_helper.m, convert to double in the call to lsqcurvefit; 0207 % and perform a speed-up (don't compute modelfit if unwanted) 0208 % - 2013/09/04 - add totalnumvxs variable 0209 % - 2013/09/03 - allow <dontsave> to refer to the auxiliary variables 0210 % - 2013/09/02 - add 'modelfit' and adjust default for 'dontsave'; add 'dosave' 0211 % - 2013/08/28 - new outputs 'resnorms' and 'numiters'; last-minute data scaling; 0212 % tweak default handling of 'dontsave' 0213 % - 2013/08/18 - Initial version. 0214 % 0215 % Example 1: 0216 % 0217 % % first, a simple example 0218 % x = randn(100,1); 0219 % y = 2*x + 3 + randn(100,1); 0220 % opt = struct( ... 0221 % 'stimulus',[x ones(100,1)], ... 0222 % 'data',y, ... 0223 % 'model',{{[1 1] [-Inf -Inf; Inf Inf] @(pp,dd) dd*pp'}}); 0224 % results = fitnonlinearmodel(opt); 0225 % 0226 % % now, try 100 bootstraps 0227 % opt.resampling = 100; 0228 % opt.optimoptions = {'Display' 'off'}; % turn off reporting 0229 % results = fitnonlinearmodel(opt); 0230 % 0231 % % now, try leave-one-out cross-validation 0232 % opt.resampling = -(2*(eye(100) - 0.5)); 0233 % results = fitnonlinearmodel(opt); 0234 % 0235 % Example 2: 0236 % 0237 % % try a more complicated example. we use 'outputfcn' to 0238 % % visualize the data and model during the optimization. 0239 % x = (1:.1:10)'; 0240 % y = evalgaussian1d([5 1 4 0],x); 0241 % y = y + randn(size(y)); 0242 % opt = struct( ... 0243 % 'stimulus',x, ... 0244 % 'data',y, ... 0245 % 'model',{{[1 2 1 0] [repmat(-Inf,[1 4]); repmat(Inf,[1 4])] ... 0246 % @(pp,dd) evalgaussian1d(pp,dd)}}, ... 0247 % 'outputfcn',@(a,b,c,d) pause2(.1) | outputfcnsanitycheck(a,b,c,1e-6,10) | outputfcnplot(a,b,c,1,d)); 0248 % results = fitnonlinearmodel(opt); 0249 % 0250 % Example 3: 0251 % 0252 % % same as the first example in Example 1, but now we use the 0253 % % linear-model functionality 0254 % x = randn(100,1); 0255 % y = 2*x + 3 + randn(100,1); 0256 % opt = struct( ... 0257 % 'stimulus',[x ones(100,1)], ... 0258 % 'data',y, ... 0259 % 'model',@(X,y) (inv(X'*X)*X'*y)'); 0260 % results = fitnonlinearmodel(opt); 0261 0262 % internal notes: 0263 % - replaces fitprf.m, fitprfstatic.m, fitprfmulti.m, and fitprfstaticmulti.m 0264 % - some of the new features: opt struct format, fix projection matrix bug (must 0265 % compute projection matrix based on concatenated regressors), multiple initial 0266 % seeds are handled internally!, user must deal with model specifics like 0267 % the HRF and positive rectification, massive clean up of the logic (e.g. 0268 % runs and data points are treated as a single case), consolidation of 0269 % the different functions, drop support for data trials (not worth the 0270 % implementation cost), drop support for NaN stimulus frames, hide the 0271 % myriad optimization options from the input level, drop run-separated metrics, 0272 % drop the stimulus transformation speed-up (it was getting implemented in a 0273 % non-general way) 0274 % - regularization is its own thing? own code module? 0275 0276 %%%%%%%%%%%%%%%%%%%%%%%%%%%% REPORT 0277 0278 fprintf('*** fitnonlinearmodel: started at %s. ***\n',datestr(now)); 0279 stime = clock; % start time 0280 0281 %%%%%%%%%%%%%%%%%%%%%%%%%%%% SETUP 0282 0283 % deal with opt 0284 if ischar(opt) 0285 opt = loadmulti(opt,'opt'); 0286 end 0287 0288 % is <data> of case (4)? 0289 isvxscase = isa(opt.data,'function_handle') && nargin(opt.data) > 0; 0290 0291 % deal with outputdir 0292 if ~isfield(opt,'outputdir') || isempty(opt.outputdir) 0293 opt.outputdir = []; 0294 end 0295 wantsave = ~isempty(opt.outputdir); % should we save results to disk? 0296 0297 % deal with vxs 0298 if isfield(opt,'vxs') 0299 if ischar(opt.vxs) 0300 vxsfull = loadmulti(opt.vxs,'vxs'); 0301 else 0302 vxsfull = opt.vxs; 0303 end 0304 vxsfull = sort(union([],flatten(vxsfull))); 0305 totalnumvxs = length(vxsfull); 0306 end 0307 0308 % deal with chunksize and chunknum 0309 if ~exist('chunksize','var') || isempty(chunksize) 0310 chunksize = []; % deal with this later 0311 end 0312 if ~exist('chunknum','var') || isempty(chunknum) 0313 chunknum = 1; 0314 end 0315 if ischar(chunksize) 0316 chunksize = str2double(chunksize); 0317 end 0318 if ischar(chunknum) 0319 chunknum = str2double(chunknum); 0320 end 0321 0322 % deal with data (including load the data) 0323 if isa(opt.data,'function_handle') 0324 fprintf('*** fitnonlinearmodel: loading data. ***\n'); 0325 if nargin(opt.data) == 0 0326 data = feval(opt.data); 0327 if iscell(data) 0328 totalnumvxs = size(data{1},2); 0329 else 0330 totalnumvxs = size(data,2); 0331 end 0332 else % note that in this case, vxs should have been specified, 0333 % so totalnumvxs should have already been calculated above. 0334 if isempty(chunksize) 0335 chunksize = length(vxsfull); 0336 end 0337 vxs = chunking(vxsfull,chunksize,chunknum); 0338 data = feval(opt.data,vxs); 0339 end 0340 else 0341 data = opt.data; 0342 if iscell(data) 0343 totalnumvxs = size(data{1},2); 0344 else 0345 totalnumvxs = size(data,2); 0346 end 0347 end 0348 if ~iscell(data) 0349 data = {data}; 0350 end 0351 0352 % deal with chunksize 0353 if isempty(chunksize) 0354 chunksize = totalnumvxs; % default is all voxels 0355 end 0356 0357 % if not isvxscase, then we may still need to do chunking 0358 if ~isvxscase 0359 vxs = chunking(1:totalnumvxs,chunksize,chunknum); 0360 if ~isequal(vxs,1:totalnumvxs) 0361 for p=1:length(data) 0362 data{p} = data{p}(:,vxs); 0363 end 0364 end 0365 end 0366 0367 % calculate the number of voxels to analyze in this function call 0368 vnum = length(vxs); 0369 0370 % finally, since we have dealt with chunksize and chunknum, we can do some reporting 0371 fprintf(['*** fitnonlinearmodel: outputdir = %s, chunksize = %d, chunknum = %d\n'], ... 0372 opt.outputdir,chunksize,chunknum); 0373 0374 % deal with model 0375 if ~isa(opt.model,'function_handle') && ~iscell(opt.model{1}) 0376 opt.model = {opt.model}; 0377 end 0378 if ~isa(opt.model,'function_handle') 0379 for p=1:length(opt.model) 0380 if length(opt.model{p}) < 4 || isempty(opt.model{p}{4}) 0381 if p==1 0382 opt.model{p}{4} = @identity; 0383 else 0384 opt.model{p}{4} = @(ss) @identity; 0385 end 0386 end 0387 end 0388 end 0389 0390 % deal with seed 0391 if ~isfield(opt,'seed') || isempty(opt.seed) 0392 opt.seed = []; 0393 end 0394 0395 % deal with optimization options 0396 if ~isfield(opt,'optimoptions') || isempty(opt.optimoptions) 0397 opt.optimoptions = {}; 0398 end 0399 if iscell(opt.optimoptions) 0400 temp = optimset('Display','iter','FunValCheck','on', ... 0401 'MaxFunEvals',Inf,'MaxIter',Inf, ... 0402 'TolFun',1e-6,'TolX',1e-6, ... 0403 'OutputFcn',@(a,b,c) outputfcnsanitycheck(a,b,c,1e-6,10)); 0404 for p=1:length(opt.optimoptions)/2 0405 temp.(opt.optimoptions{(p-1)*2+1}) = opt.optimoptions{(p-1)*2+2}; 0406 end 0407 opt.optimoptions = temp; 0408 clear temp; 0409 end 0410 if ~isfield(opt,'outputfcn') || isempty(opt.outputfcn) 0411 opt.outputfcn = []; 0412 end 0413 0414 % deal with resampling schemes 0415 if ~isfield(opt,'wantresampleruns') || isempty(opt.wantresampleruns) 0416 if length(data) == 1 0417 opt.wantresampleruns = 0; 0418 else 0419 opt.wantresampleruns = 1; 0420 end 0421 end 0422 if opt.wantresampleruns 0423 numdataunits = length(data); 0424 else 0425 numdataunits = sum(cellfun(@(x) size(x,1),data)); 0426 end 0427 if ~isfield(opt,'resampling') || isempty(opt.resampling) 0428 opt.resampling = 0; 0429 end 0430 if isequal(opt.resampling,0) 0431 resamplingmode = 'full'; 0432 elseif ~iscell(opt.resampling) && numel(opt.resampling) > 1 0433 resamplingmode = 'xval'; 0434 else 0435 resamplingmode = 'boot'; 0436 end 0437 if isequal(resamplingmode,'boot') 0438 if ~iscell(opt.resampling) 0439 opt.resampling = {opt.resampling}; 0440 end 0441 if length(opt.resampling) < 2 || isempty(opt.resampling{2}) 0442 opt.resampling{2} = sum(100*clock); 0443 end 0444 if length(opt.resampling) < 3 || isempty(opt.resampling{3}) 0445 opt.resampling{3} = ones(1,numdataunits); 0446 end 0447 end 0448 0449 % deal with metric 0450 if ~isfield(opt,'metric') || isempty(opt.metric) 0451 opt.metric = @calccorrelation; 0452 end 0453 0454 % deal with additional regressors 0455 if ~isfield(opt,'maxpolydeg') || isempty(opt.maxpolydeg) 0456 opt.maxpolydeg = NaN; 0457 end 0458 if length(opt.maxpolydeg) == 1 0459 opt.maxpolydeg = repmat(opt.maxpolydeg,[1 length(data)]); 0460 end 0461 if ~isfield(opt,'wantremovepoly') || isempty(opt.wantremovepoly) 0462 opt.wantremovepoly = 1; 0463 end 0464 if ~isfield(opt,'extraregressors') || isempty(opt.extraregressors) 0465 opt.extraregressors = []; 0466 end 0467 if ~isfield(opt,'wantremoveextra') || isempty(opt.wantremoveextra) 0468 opt.wantremoveextra = 1; 0469 end 0470 if ~isfield(opt,'dontsave') || (isempty(opt.dontsave) && ~iscell(opt.dontsave)) 0471 opt.dontsave = {'modelfit' 'numiters' 'resnorms'}; 0472 end 0473 if ~iscell(opt.dontsave) 0474 opt.dontsave = {opt.dontsave}; 0475 end 0476 if ~isfield(opt,'dosave') || isempty(opt.dosave) 0477 opt.dosave = {}; 0478 end 0479 if ~iscell(opt.dosave) 0480 opt.dosave = {opt.dosave}; 0481 end 0482 0483 % make outputdir if necessary 0484 if wantsave 0485 mkdirquiet(opt.outputdir); 0486 opt.outputdir = subscript(matchfiles(opt.outputdir),1,1); 0487 outputfile = sprintf([opt.outputdir '/%06d.mat'],chunknum); 0488 end 0489 0490 % set random number seed 0491 if isequal(resamplingmode,'boot') 0492 setrandstate({opt.resampling{2}}); 0493 end 0494 0495 % calc 0496 numtime = cellfun(@(x) size(x,1),data); 0497 0498 % save initial time 0499 if wantsave 0500 saveexcept(outputfile,[{'data'} setdiff(opt.dontsave,opt.dosave)]); 0501 end 0502 0503 %%%%%%%%%%%%%%%%%%%%%%%%%%%% LOAD SOME ITEMS 0504 0505 % deal with stimulus 0506 if isa(opt.stimulus,'function_handle') 0507 fprintf('*** fitnonlinearmodel: loading stimulus. ***\n'); 0508 stimulus = feval(opt.stimulus); 0509 else 0510 stimulus = opt.stimulus; 0511 end 0512 if ~iscell(stimulus) 0513 stimulus = {stimulus}; 0514 end 0515 stimulus = cellfun(@full,stimulus,'UniformOutput',0); 0516 0517 % deal with extraregressors 0518 if isa(opt.extraregressors,'function_handle') 0519 fprintf('*** fitnonlinearmodel: loading extra regressors. ***\n'); 0520 extraregressors = feval(opt.extraregressors); 0521 else 0522 extraregressors = opt.extraregressors; 0523 end 0524 if isempty(extraregressors) 0525 extraregressors = repmat({[]},[1 length(data)]); 0526 end 0527 if ~iscell(extraregressors) 0528 extraregressors = {extraregressors}; 0529 end 0530 0531 %%%%%%%%%%%%%%%%%%%%%%%%%%%% PRECOMPUTE SOME STUFF 0532 0533 % construct polynomial regressors matrix 0534 polyregressors = {}; 0535 for p=1:length(data) 0536 if isnan(opt.maxpolydeg(p)) 0537 polyregressors{p} = zeros(numtime(p),0); 0538 else 0539 polyregressors{p} = constructpolynomialmatrix(numtime(p),0:opt.maxpolydeg(p)); 0540 end 0541 end 0542 0543 % construct total regressors matrix for fitting purposes 0544 % (i.e. both polynomials and extra regressors) 0545 % first, construct the run-wise regressors 0546 tmatrix = {}; 0547 for p=1:length(data) 0548 tmatrix{p} = cat(2,polyregressors{p},extraregressors{p}); 0549 end 0550 % then, separate them using blkdiag 0551 temp = blkdiag(tmatrix{:}); 0552 cnt = 0; 0553 for p=1:length(data) 0554 tmatrix{p} = temp(cnt+(1:size(tmatrix{p},1)),:); 0555 cnt = cnt + size(tmatrix{p},1); 0556 end 0557 clear temp; 0558 0559 % construct special regressors matrix for the purposes of the <metric> 0560 % first, construct the run-wise regressors 0561 smatrix = {}; 0562 for p=1:length(data) 0563 temp = []; 0564 if opt.wantremovepoly 0565 temp = cat(2,temp,polyregressors{p}); 0566 end 0567 if opt.wantremoveextra 0568 temp = cat(2,temp,extraregressors{p}); 0569 end 0570 smatrix{p} = temp; 0571 end 0572 % then, separate them using blkdiag 0573 temp = blkdiag(smatrix{:}); 0574 cnt = 0; 0575 for p=1:length(data) 0576 smatrix{p} = temp(cnt+(1:size(smatrix{p},1)),:); 0577 cnt = cnt + size(smatrix{p},1); 0578 end 0579 clear temp; 0580 0581 % figure out trainfun and testfun for resampling 0582 switch resamplingmode 0583 case 'full' 0584 trainfun = {@(x) catcell(1,x)}; 0585 testfun = {@(x) []}; 0586 case 'xval' 0587 trainfun = {}; 0588 testfun = {}; 0589 for p=1:size(opt.resampling,1) 0590 trainix = find(opt.resampling(p,:) == 1); 0591 testix = find(opt.resampling(p,:) == -1); 0592 if opt.wantresampleruns 0593 trainfun{p} = @(x) catcell(1,x(trainix)); 0594 testfun{p} = @(x) catcell(1,x(testix)); 0595 else 0596 trainfun{p} = @(x) subscript(catcell(1,x),{trainix ':' ':' ':' ':' ':'}); % HACKY 0597 testfun{p} = @(x) subscript(catcell(1,x),{testix ':' ':' ':' ':' ':'}); 0598 end 0599 end 0600 case 'boot' 0601 trainfun = {}; 0602 testfun = {}; 0603 for p=1:opt.resampling{1} 0604 trainix = []; 0605 for b=1:max(opt.resampling{3}) 0606 temp = opt.resampling{3}==b; 0607 trainix = [trainix subscript(find(temp),ceil(rand(1,sum(temp))*sum(temp)))]; 0608 end 0609 testix = []; 0610 if opt.wantresampleruns 0611 trainfun{p} = @(x) catcell(1,x(trainix)); 0612 testfun{p} = @(x) catcell(1,x(testix)); 0613 else 0614 trainfun{p} = @(x) subscript(catcell(1,x),{trainix ':' ':' ':' ':' ':'}); % HACKY 0615 testfun{p} = @(x) subscript(catcell(1,x),{testix ':' ':' ':' ':' ':'}); 0616 end 0617 end 0618 end 0619 0620 %%%%%%%%%%%%%%%%%%%%%%%%%%%% PERFORM THE FITTING 0621 0622 % loop over voxels 0623 clear results0; 0624 parfor p=1:vnum 0625 0626 % report 0627 fprintf('*** fitnonlinearmodel: processing voxel %d (%d of %d). ***\n',vxs(p),p,vnum); 0628 vtime = clock; % start time for current voxel 0629 0630 % get data and hack it in 0631 opt2 = opt; 0632 opt2.data = cellfun(@(x) x(:,p),data,'UniformOutput',0); 0633 0634 % get seed and hack it in 0635 if ~isempty(opt2.seed) 0636 assert(~isa(opt2.model,'function_handle')); % sanity check 0637 if isa(opt2.seed,'function_handle') 0638 seed0 = feval(opt2.seed,vxs(p)); 0639 else 0640 seed0 = opt2.seed; 0641 end 0642 opt2.model{1}{1} = seed0; 0643 end 0644 0645 % call helper function to do the actual work 0646 results0(p) = fitnonlinearmodel_helper(opt2,stimulus,tmatrix,smatrix,trainfun,testfun); 0647 0648 % report 0649 fprintf('*** fitnonlinearmodel: voxel %d (%d of %d) took %.1f seconds. ***\n', ... 0650 vxs(p),p,vnum,etime(clock,vtime)); 0651 0652 end 0653 0654 % consolidate results 0655 results = struct; 0656 results.params = cat(3,results0.params); 0657 results.testdata = cat(2,results0.testdata); 0658 results.modelpred = cat(2,results0.modelpred); 0659 results.modelfit = cat(3,results0.modelfit); 0660 results.trainperformance = cat(1,results0.trainperformance).'; 0661 results.testperformance = cat(1,results0.testperformance).'; 0662 results.aggregatedtestperformance = cat(2,results0.aggregatedtestperformance); 0663 results.numiters = cat(2,{results0.numiters}); 0664 results.resnorms = cat(2,{results0.resnorms}); 0665 0666 % kill unwanted outputs 0667 for p=1:length(opt.dontsave) 0668 0669 % if member of dosave, keep it! 0670 if ismember(opt.dontsave{p},opt.dosave) 0671 0672 % if not, then kill it (if it exists)! 0673 else 0674 if isfield(results,opt.dontsave{p}) 0675 results = rmfield(results,opt.dontsave{p}); 0676 end 0677 end 0678 0679 end 0680 0681 % save results 0682 if wantsave 0683 save(outputfile,'-struct','results','-append'); 0684 end 0685 0686 %%%%%%%%%%%%%%%%%%%%%%%%%%%% REPORT 0687 0688 fprintf('*** fitnonlinearmodel: ended at %s (%.1f minutes). ***\n', ... 0689 datestr(now),etime(clock,stime)/60);