Statsmodels Methods Overview
[1]:
# This cells setups the environment when executed in Google Colab.
try:
import google.colab
!curl -s https://raw.githubusercontent.com/ibs-lab/cedalion/dev/scripts/colab_setup.py -o colab_setup.py
# Select branch with --branch "branch name" (default is "dev")
%run colab_setup.py
except ImportError:
pass
Cedalion uses statsmodels for its GLM fitting functionality, and this notebook gives an overview of some common statsmodels methods. The glm.fit function returns an xr.DataArray of statsmodels RegressionResults objects with dimensions (channel, chromo). Any RegressionResults method can be called on this DataArray using the .sm accessor. A full list of available methods and attribute can be found in the statsmodels documentation.
In this notebook, we’ll assume that we have already loaded our data and set up the GLM. See the other GLM notebooks for details on setup.
We’ll start by fitting the GLM and displaying the resulting object.
[3]:
results = glm.fit(ts_long, dms)
display(results)
67%|██████▋ | 2/3 [00:00<00:00, 154.85it/s]
67%|██████▋ | 2/3 [00:00<00:00, 158.68it/s]
50%|█████ | 1/2 [00:00<00:00, 82.40it/s]
50%|█████ | 1/2 [00:00<00:00, 84.03it/s]
67%|██████▋ | 2/3 [00:00<00:00, 157.67it/s]
67%|██████▋ | 2/3 [00:00<00:00, 168.49it/s]
50%|█████ | 1/2 [00:00<00:00, 84.39it/s]
50%|█████ | 1/2 [00:00<00:00, 86.31it/s]
67%|██████▋ | 2/3 [00:00<00:00, 166.11it/s]
67%|██████▋ | 2/3 [00:00<00:00, 161.85it/s]
50%|█████ | 1/2 [00:00<00:00, 87.25it/s]
50%|█████ | 1/2 [00:00<00:00, 86.61it/s]
67%|██████▋ | 2/3 [00:00<00:00, 168.74it/s]
67%|██████▋ | 2/3 [00:00<00:00, 167.89it/s]
50%|█████ | 1/2 [00:00<00:00, 84.58it/s]
50%|█████ | 1/2 [00:00<00:00, 87.11it/s]
<xarray.DataArray (channel: 20, chromo: 2)> Size: 320B <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f5... Coordinates: * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
The results object is a DataArray of statsmodels RegressionResults: one for each channel/chromophore. In order to call a method on our results object, we just use the accessor .sm
, followed by the RegressionResults method. Cedalion handles calling the method on each individual RegressionResults object in the results DataArray, and returns the outputs in a new DataArray with the
appropriate dimensions. This allows us to get information on all channels simply and concisely.
Beta Coefficients (params)
First, we’ll retreive the coefficients of the GLM fit.
[4]:
results.sm.params
[4]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB -0.001429 0.1604 0.2017 0.7729 -0.003184 ... -0.01349 -0.01729 -0.01956 -0.03845 Coordinates: * regressor (regressor) object 32B 'HRF control' ... 'short' * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
Standard Error
The method bse returns the standard errors of the parameter estimates
[5]:
results.sm.bse
[5]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB 0.004972 0.005014 0.005029 0.005514 0.0024 ... 0.00284 0.00284 0.002842 0.006336 Coordinates: * regressor (regressor) object 32B 'HRF control' ... 'short' * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
Confidence Intervals (conf_int)
The method conf_int calculates the confidence interval of the fitted parameters. We can specify the alpha level for the confidence interval (default 5%).
In the output, the index conf_int marks the low (conf_int=0) and high (conf_int=1) endpoints of the confidence interval.
[6]:
results.sm.conf_int(alpha=0.05)
[6]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4, conf_int: 2)> Size: 3kB -0.01117 0.008315 0.1505 0.1702 0.1918 ... -0.02513 -0.01399 -0.05087 -0.02603 Coordinates: * regressor (regressor) object 32B 'HRF control' ... 'short' * conf_int (conf_int) int64 16B 0 1 * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
Covariance and Variance
The method cov_params computes the covariance matrix. Note that we can recover the variances from the diagonal elements of the matrix.
[7]:
results.sm.cov_params()
[7]:
<xarray.DataArray (channel: 20, chromo: 2, regressor_r: 4, regressor_c: 4)> Size: 5kB 2.472e-05 -4.509e-08 -5.209e-08 3.787e-07 ... 2.327e-07 7.423e-07 4.014e-05 Coordinates: * regressor_r (regressor_r) object 32B 'HRF control' ... 'short' * regressor_c (regressor_c) object 32B 'HRF control' ... 'short' * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
Here we visualize the covariance matrix for a single regressor:
[8]:
p.imshow(results.sm.cov_params()[0,0,:,:]);

The convenience function sm.regressor_variances
computes the variances of the regressors, i.e. the diagonal elements of the covariance matrices.
[9]:
# returns diagonal elements of the cov matrices
results.sm.regressor_variances()
[9]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB 2.472e-05 2.514e-05 2.529e-05 3.04e-05 ... 8.064e-06 8.076e-06 4.014e-05 Coordinates: * regressor (regressor) object 32B 'HRF control' ... 'short' * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
Statistical Tests - T-values, P-values
Statsmodels also has sophisticated functionality for performing statistical tests on regression results.
The method t-values simply returns the t-statistic for each coefficient.
[10]:
results.sm.tvalues
[10]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB -0.2875 31.98 40.1 140.2 -1.327 -17.83 ... 97.04 -4.749 -6.089 -6.882 -6.069 Coordinates: * regressor (regressor) object 32B 'HRF control' ... 'short' * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
The method t-test allows for general linear hypothesis tests.
We can specify contrasts either by passing an r-matrix or through strings. See the patsy documentation for details on specifying linear contrasts using strings.
The method returns an array of statsmodel ContrastResult objects.
[11]:
# Specifying hypotheses for t-test as string
hypotheses = "HRF Tapping/Left = HRF control, HRF Tapping/Right = HRF control"
contrast_results = results.sm.t_test(hypotheses)
display(contrast_results)
<xarray.DataArray (channel: 20, chromo: 2)> Size: 320B Test for Constraints ... Coordinates: * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Attributes: description: OLS model via statsmodels.regression
We can use the .sm
accessor on the resulting DataArray
of ContrastResult
objects, just like we did before with the RegressionResult
arrays.
The convenience functions sm.tvalues()
and sm.pvalues()
return the t- and p-values of the contrast, respectively.
The sm.map
method works analogously to the map function in python, applying a given function to each cell of the DataArray.
Below, we extract the t-values of the contrast using first the map method and then the convenience function.
[12]:
# Extracting t-values from the contrast results
display(contrast_results.sm.map(lambda i : i.tvalue, name="hypothesis"))
display(contrast_results.sm.t_values())
<xarray.DataArray (channel: 20, chromo: 2, hypothesis: 2)> Size: 640B 22.89 28.69 -11.67 -14.53 18.91 20.18 ... -12.03 -3.51 11.8 9.7 -0.947 -1.511 Coordinates: * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Dimensions without coordinates: hypothesis Attributes: description: OLS model via statsmodels.regression
<xarray.DataArray (channel: 20, chromo: 2, hypothesis: 2)> Size: 640B 22.89 28.69 -11.67 -14.53 18.91 20.18 ... -12.03 -3.51 11.8 9.7 -0.947 -1.511 Coordinates: * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Dimensions without coordinates: hypothesis Attributes: description: OLS model via statsmodels.regression
[13]:
# Extracting p-values from the contrast results
display(contrast_results.sm.p_values())
<xarray.DataArray (channel: 20, chromo: 2, hypothesis: 2)> Size: 640B 9.929e-115 6.125e-178 2.164e-31 1.18e-47 ... 5.073e-32 3.313e-22 0.3436 0.1307 Coordinates: * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Dimensions without coordinates: hypothesis Attributes: description: OLS model via statsmodels.regression
Plotting Uncertainty Bands
In this section, we explore a technique, still in development, for visualizing uncertainty in a GLM with many regressors. This method visualizes the uncertainty in a GLM fit by drawing multiple samples of the beta coefficients from their estimated covariance (via multivariate normal sampling). It then uses these sampled betas to generate predicted time courses, and plots the mean prediction with a shaded band representing ±3 standard deviations across samples, thus capturing the variability due to model uncertainty. The band is quite narrow because uncertainty is low in this toy example.
FIXME: Band even smaller than before?
[14]:
# Sample betas
betas = results.sm.params
cov = results.sm.cov_params()
sampled_betas = xr.zeros_like(betas).expand_dims({"sample" : 100}, axis=-1).copy()
for i_ch in range(sampled_betas.shape[0]):
for i_cr in range(sampled_betas.shape[1]):
sampled_betas[i_ch, i_cr, :, :] = np.random.multivariate_normal(
betas[i_ch, i_cr, :],
cov[i_ch, i_cr, :, :],
size=100,
).T
display(sampled_betas)
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4, sample: 100)> Size: 128kB -0.004786 -0.001207 -0.006483 0.002115 ... -0.054 -0.04183 -0.03155 -0.04055 Coordinates: * regressor (regressor) object 32B 'HRF control' ... 'short' * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8' Dimensions without coordinates: sample Attributes: description: OLS model via statsmodels.regression
[15]:
# Predicting the time series using the sampled betas
pred = glm.predict(ts_long, sampled_betas, dms)
display(pred)
<xarray.DataArray (time: 23239, channel: 20, chromo: 2, sample: 100)> Size: 744MB -0.09512 -0.09486 -0.09518 -0.09504 ... 0.0001133 8.782e-05 6.622e-05 8.512e-05 Coordinates: * time (time) float64 186kB 0.0 0.128 0.256 ... 2.974e+03 2.974e+03 * chromo (chromo) <U3 24B 'HbO' 'HbR' samples (time) int64 186kB 0 1 2 3 4 ... 23235 23236 23237 23238 * channel (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8' short_channel (channel) object 160B 'S1D9' 'S1D9' ... 'S8D16' 'S8D16' comp_group (channel) int64 160B 0 0 0 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 7 7 source (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S8' 'S8' detector (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D7' 'D7' 'D8' Dimensions without coordinates: sample
[16]:
# Plot a band between mean-3*std and mean+3*std
# We select a 20 second window for better visualization
pred_mean = pred.mean("sample")
pred_std = pred.std("sample")
mm = pred_mean.loc[slice(60,80), "S5D5", "HbO"]
ss = pred_std.loc[slice(60,80), "S5D5", "HbO"]
p.plot(mm.time, mm, c="r")
p.fill_between(mm.time, mm-3*ss, mm+3*ss, fc="y", alpha=.8)
p.xlabel("time / s")
p.ylabel(r"$\Delta$ c / uM");
