function f = calccorrelation(x,y,dim,wantmeansubtract,wantgainsensitive) <x>,<y> are matrices with the same dimensions <dim> (optional) is the dimension of interest. default to 2 if <x> is a (horizontal) vector and to 1 if not. special case is 0 which means to calculate globally. <wantmeansubtract> (optional) is whether to subtract mean first. default: 1. <wantgainsensitive> (optional) is whether to make the metric sensitive to gain. default: 0. calculate Pearson's correlation coefficient between <x> and <y> for each case oriented along <dim>. this is achieved by taking the two vectors implicated in each case, subtracting each vector's mean, normalizing each vector to have unit length, and then taking their dot product. a special case is when <wantmeansubtract>==0, in which case we omit the mean- subtraction. this technically isn't Pearson's correlation any more. another special case is <wantgainsensitive>, in which case we take the two vectors implicated in each case, subtract each vector's mean (if <wantmeansubtract>), normalize the two vectors by the same constant such that the larger of the two vectors has length 1, and then take the dot- product. note that this metric still ranges from -1 to 1 and that any gain mismatch can only hurt you (i.e. push you towards a correlation of 0). (in retrospect, we now observe that it is probably better to use calccod.m in order to be sensitive to gain, rather than to use <wantgainsensitive> in this function.) if there is no variance in one (or more) of the inputs, the result is returned as NaN. NaNs are handled gracefully (a NaN causes that data point to be ignored). if there are no valid data points (i.e. all data points are ignored because of NaNs), we return NaN for that case. we don't use caution with respect to cases involving low variance (see unitlength.m). be careful of the presumption that the mean and scale of <x> and <y> can be discounted. if you do not want to perform this discounting, use calccod.m instead! note some weird cases: calccorrelation([],[]) is [] example: calccorrelation(randn(1,100),randn(1,100))
0001 function f = calccorrelation(x,y,dim,wantmeansubtract,wantgainsensitive) 0002 0003 % function f = calccorrelation(x,y,dim,wantmeansubtract,wantgainsensitive) 0004 % 0005 % <x>,<y> are matrices with the same dimensions 0006 % <dim> (optional) is the dimension of interest. 0007 % default to 2 if <x> is a (horizontal) vector and to 1 if not. 0008 % special case is 0 which means to calculate globally. 0009 % <wantmeansubtract> (optional) is whether to subtract mean first. default: 1. 0010 % <wantgainsensitive> (optional) is whether to make the metric sensitive 0011 % to gain. default: 0. 0012 % 0013 % calculate Pearson's correlation coefficient between <x> and <y> for 0014 % each case oriented along <dim>. this is achieved by taking 0015 % the two vectors implicated in each case, subtracting each vector's mean, 0016 % normalizing each vector to have unit length, and then taking their dot product. 0017 % 0018 % a special case is when <wantmeansubtract>==0, in which case we omit the mean- 0019 % subtraction. this technically isn't Pearson's correlation any more. 0020 % 0021 % another special case is <wantgainsensitive>, in which case we take the two 0022 % vectors implicated in each case, subtract each vector's mean (if 0023 % <wantmeansubtract>), normalize the two vectors by the same constant 0024 % such that the larger of the two vectors has length 1, and then take the dot- 0025 % product. note that this metric still ranges from -1 to 1 and that any gain 0026 % mismatch can only hurt you (i.e. push you towards a correlation of 0). 0027 % (in retrospect, we now observe that it is probably better to use calccod.m in order 0028 % to be sensitive to gain, rather than to use <wantgainsensitive> in this function.) 0029 % 0030 % if there is no variance in one (or more) of the inputs, the 0031 % result is returned as NaN. 0032 % 0033 % NaNs are handled gracefully (a NaN causes that data point to be ignored). 0034 % 0035 % if there are no valid data points (i.e. all data points are 0036 % ignored because of NaNs), we return NaN for that case. 0037 % 0038 % we don't use caution with respect to cases involving low variance (see unitlength.m). 0039 % 0040 % be careful of the presumption that the mean and scale of <x> and <y> can be discounted. 0041 % if you do not want to perform this discounting, use calccod.m instead! 0042 % 0043 % note some weird cases: 0044 % calccorrelation([],[]) is [] 0045 % 0046 % example: 0047 % calccorrelation(randn(1,100),randn(1,100)) 0048 0049 % input 0050 if ~exist('dim','var') || isempty(dim) 0051 if isvector(x) && size(x,1)==1 0052 dim = 2; 0053 else 0054 dim = 1; 0055 end 0056 end 0057 if ~exist('wantmeansubtract','var') || isempty(wantmeansubtract) 0058 wantmeansubtract = 1; 0059 end 0060 if ~exist('wantgainsensitive','var') || isempty(wantgainsensitive) 0061 wantgainsensitive = 0; 0062 end 0063 0064 % handle weird case up front 0065 if isempty(x) 0066 f = []; 0067 return; 0068 end 0069 0070 % handle 0 case 0071 if dim==0 0072 x = x(:); 0073 y = y(:); 0074 dim = 1; 0075 end 0076 0077 % propagate NaNs (i.e. ignore invalid data points) 0078 x(isnan(y)) = NaN; 0079 y(isnan(x)) = NaN; 0080 0081 % subtract mean 0082 if wantmeansubtract 0083 x = zeromean(x,dim); 0084 y = zeromean(y,dim); 0085 end 0086 0087 % normalize x and y to be unit length 0088 [xnorm,xlen] = unitlength(x,dim,[],0); 0089 [ynorm,ylen] = unitlength(y,dim,[],0); 0090 0091 % if gain sensitive, then do something tricky: 0092 if wantgainsensitive 0093 0094 % case where x is the bigger vector 0095 temp = xnorm .* bsxfun(@(x,y) zerodiv(x,y,NaN,0),y,xlen); 0096 bad = all(isnan(temp),dim); 0097 f1 = nansum(temp,dim); 0098 f1(bad) = NaN; 0099 0100 % case where y is the bigger vector 0101 temp = bsxfun(@(x,y) zerodiv(x,y,NaN,0),x,ylen) .* ynorm; 0102 bad = all(isnan(temp),dim); 0103 f2 = nansum(temp,dim); 0104 f2(bad) = NaN; 0105 0106 % figure out which one to use 0107 f = f1; % this is when x is bigger 0108 f(xlen < ylen) = f2(xlen < ylen); % this is when y is bigger 0109 0110 % if not, just take the dot product 0111 else 0112 temp = xnorm .* ynorm; 0113 % at this point, we want to sum along dim. however, if a case has all NaNs, we need the output to be NaN. 0114 bad = all(isnan(temp),dim); % record bad cases 0115 f = nansum(temp,dim); % do the summation 0116 f(bad) = NaN; % explicitly set bad cases to NaN 0117 end