Statsmodels Methods Overview

[1]:
# This cells setups the environment when executed in Google Colab.
try:
    import google.colab
    !curl -s https://raw.githubusercontent.com/ibs-lab/cedalion/dev/scripts/colab_setup.py -o colab_setup.py
    # Select branch with --branch "branch name" (default is "dev")
    %run colab_setup.py
except ImportError:
    pass

Cedalion uses statsmodels for its GLM fitting functionality, and this notebook gives an overview of some common statsmodels methods. The glm.fit function returns an xr.DataArray of statsmodels RegressionResults objects with dimensions (channel, chromo). Any RegressionResults method can be called on this DataArray using the .sm accessor. A full list of available methods and attribute can be found in the statsmodels documentation.

In this notebook, we’ll assume that we have already loaded our data and set up the GLM. See the other GLM notebooks for details on setup.

We’ll start by fitting the GLM and displaying the resulting object.

[3]:
results = glm.fit(ts_long, dms)
display(results)
 67%|██████▋   | 2/3 [00:00<00:00, 154.85it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 158.68it/s]
 50%|█████     | 1/2 [00:00<00:00, 82.40it/s]
 50%|█████     | 1/2 [00:00<00:00, 84.03it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 157.67it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 168.49it/s]
 50%|█████     | 1/2 [00:00<00:00, 84.39it/s]
 50%|█████     | 1/2 [00:00<00:00, 86.31it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 166.11it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 161.85it/s]
 50%|█████     | 1/2 [00:00<00:00, 87.25it/s]
 50%|█████     | 1/2 [00:00<00:00, 86.61it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 168.74it/s]
 67%|██████▋   | 2/3 [00:00<00:00, 167.89it/s]
 50%|█████     | 1/2 [00:00<00:00, 84.58it/s]
 50%|█████     | 1/2 [00:00<00:00, 87.11it/s]
<xarray.DataArray (channel: 20, chromo: 2)> Size: 320B
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f5...
Coordinates:
  * chromo    (chromo) <U3 24B 'HbO' 'HbR'
  * channel   (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source    (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector  (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

The results object is a DataArray of statsmodels RegressionResults: one for each channel/chromophore. In order to call a method on our results object, we just use the accessor .sm, followed by the RegressionResults method. Cedalion handles calling the method on each individual RegressionResults object in the results DataArray, and returns the outputs in a new DataArray with the appropriate dimensions. This allows us to get information on all channels simply and concisely.

Beta Coefficients (params)

First, we’ll retreive the coefficients of the GLM fit.

[4]:
results.sm.params
[4]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB
-0.001429 0.1604 0.2017 0.7729 -0.003184 ... -0.01349 -0.01729 -0.01956 -0.03845
Coordinates:
  * regressor  (regressor) object 32B 'HRF control' ... 'short'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
  * channel    (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source     (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector   (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

Standard Error

The method bse returns the standard errors of the parameter estimates

[5]:
results.sm.bse
[5]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB
0.004972 0.005014 0.005029 0.005514 0.0024 ... 0.00284 0.00284 0.002842 0.006336
Coordinates:
  * regressor  (regressor) object 32B 'HRF control' ... 'short'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
  * channel    (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source     (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector   (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

Confidence Intervals (conf_int)

The method conf_int calculates the confidence interval of the fitted parameters. We can specify the alpha level for the confidence interval (default 5%).

In the output, the index conf_int marks the low (conf_int=0) and high (conf_int=1) endpoints of the confidence interval.

[6]:
results.sm.conf_int(alpha=0.05)
[6]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4, conf_int: 2)> Size: 3kB
-0.01117 0.008315 0.1505 0.1702 0.1918 ... -0.02513 -0.01399 -0.05087 -0.02603
Coordinates:
  * regressor  (regressor) object 32B 'HRF control' ... 'short'
  * conf_int   (conf_int) int64 16B 0 1
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
  * channel    (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source     (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector   (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

Covariance and Variance

The method cov_params computes the covariance matrix. Note that we can recover the variances from the diagonal elements of the matrix.

[7]:
results.sm.cov_params()
[7]:
<xarray.DataArray (channel: 20, chromo: 2, regressor_r: 4, regressor_c: 4)> Size: 5kB
2.472e-05 -4.509e-08 -5.209e-08 3.787e-07 ... 2.327e-07 7.423e-07 4.014e-05
Coordinates:
  * regressor_r  (regressor_r) object 32B 'HRF control' ... 'short'
  * regressor_c  (regressor_c) object 32B 'HRF control' ... 'short'
  * chromo       (chromo) <U3 24B 'HbO' 'HbR'
  * channel      (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source       (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S8' 'S8'
    detector     (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

Here we visualize the covariance matrix for a single regressor:

[8]:
p.imshow(results.sm.cov_params()[0,0,:,:]);
../../_images/examples_modeling_35_statsmodels_overview_17_0.png

The convenience function sm.regressor_variances computes the variances of the regressors, i.e. the diagonal elements of the covariance matrices.

[9]:
# returns diagonal elements of the cov matrices
results.sm.regressor_variances()
[9]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB
2.472e-05 2.514e-05 2.529e-05 3.04e-05 ... 8.064e-06 8.076e-06 4.014e-05
Coordinates:
  * regressor  (regressor) object 32B 'HRF control' ... 'short'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
  * channel    (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source     (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector   (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

Statistical Tests - T-values, P-values

Statsmodels also has sophisticated functionality for performing statistical tests on regression results.

The method t-values simply returns the t-statistic for each coefficient.

[10]:
results.sm.tvalues
[10]:
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4)> Size: 1kB
-0.2875 31.98 40.1 140.2 -1.327 -17.83 ... 97.04 -4.749 -6.089 -6.882 -6.069
Coordinates:
  * regressor  (regressor) object 32B 'HRF control' ... 'short'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
  * channel    (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source     (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector   (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

The method t-test allows for general linear hypothesis tests.

We can specify contrasts either by passing an r-matrix or through strings. See the patsy documentation for details on specifying linear contrasts using strings.

The method returns an array of statsmodel ContrastResult objects.

[11]:
# Specifying hypotheses for t-test as string
hypotheses = "HRF Tapping/Left = HRF control, HRF Tapping/Right = HRF control"

contrast_results = results.sm.t_test(hypotheses)
display(contrast_results)

<xarray.DataArray (channel: 20, chromo: 2)> Size: 320B
                             Test for Constraints                            ...
Coordinates:
  * chromo    (chromo) <U3 24B 'HbO' 'HbR'
  * channel   (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source    (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector  (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Attributes:
    description:  OLS model via statsmodels.regression

We can use the .sm accessor on the resulting DataArray of ContrastResult objects, just like we did before with the RegressionResult arrays.

The convenience functions sm.tvalues() and sm.pvalues() return the t- and p-values of the contrast, respectively.

The sm.map method works analogously to the map function in python, applying a given function to each cell of the DataArray.

Below, we extract the t-values of the contrast using first the map method and then the convenience function.

[12]:
# Extracting t-values from the contrast results
display(contrast_results.sm.map(lambda i : i.tvalue, name="hypothesis"))

display(contrast_results.sm.t_values())
<xarray.DataArray (channel: 20, chromo: 2, hypothesis: 2)> Size: 640B
22.89 28.69 -11.67 -14.53 18.91 20.18 ... -12.03 -3.51 11.8 9.7 -0.947 -1.511
Coordinates:
  * chromo    (chromo) <U3 24B 'HbO' 'HbR'
  * channel   (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source    (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector  (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Dimensions without coordinates: hypothesis
Attributes:
    description:  OLS model via statsmodels.regression
<xarray.DataArray (channel: 20, chromo: 2, hypothesis: 2)> Size: 640B
22.89 28.69 -11.67 -14.53 18.91 20.18 ... -12.03 -3.51 11.8 9.7 -0.947 -1.511
Coordinates:
  * chromo    (chromo) <U3 24B 'HbO' 'HbR'
  * channel   (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source    (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector  (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Dimensions without coordinates: hypothesis
Attributes:
    description:  OLS model via statsmodels.regression
[13]:
# Extracting p-values from the contrast results
display(contrast_results.sm.p_values())
<xarray.DataArray (channel: 20, chromo: 2, hypothesis: 2)> Size: 640B
9.929e-115 6.125e-178 2.164e-31 1.18e-47 ... 5.073e-32 3.313e-22 0.3436 0.1307
Coordinates:
  * chromo    (chromo) <U3 24B 'HbO' 'HbR'
  * channel   (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source    (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector  (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Dimensions without coordinates: hypothesis
Attributes:
    description:  OLS model via statsmodels.regression

Plotting Uncertainty Bands

In this section, we explore a technique, still in development, for visualizing uncertainty in a GLM with many regressors. This method visualizes the uncertainty in a GLM fit by drawing multiple samples of the beta coefficients from their estimated covariance (via multivariate normal sampling). It then uses these sampled betas to generate predicted time courses, and plots the mean prediction with a shaded band representing ±3 standard deviations across samples, thus capturing the variability due to model uncertainty. The band is quite narrow because uncertainty is low in this toy example.

FIXME: Band even smaller than before?

[14]:
# Sample betas

betas = results.sm.params
cov = results.sm.cov_params()

sampled_betas = xr.zeros_like(betas).expand_dims({"sample" : 100}, axis=-1).copy()
for i_ch in range(sampled_betas.shape[0]):
    for i_cr in range(sampled_betas.shape[1]):
        sampled_betas[i_ch, i_cr, :, :] = np.random.multivariate_normal(
            betas[i_ch, i_cr, :],
            cov[i_ch, i_cr, :, :],
            size=100,
        ).T

display(sampled_betas)
<xarray.DataArray (channel: 20, chromo: 2, regressor: 4, sample: 100)> Size: 128kB
-0.004786 -0.001207 -0.006483 0.002115 ... -0.054 -0.04183 -0.03155 -0.04055
Coordinates:
  * regressor  (regressor) object 32B 'HRF control' ... 'short'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
  * channel    (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    source     (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S7' 'S8' 'S8'
    detector   (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D6' 'D7' 'D7' 'D8'
Dimensions without coordinates: sample
Attributes:
    description:  OLS model via statsmodels.regression
[15]:
# Predicting the time series using the sampled betas
pred = glm.predict(ts_long, sampled_betas, dms)
display(pred)
<xarray.DataArray (time: 23239, channel: 20, chromo: 2, sample: 100)> Size: 744MB
-0.09512 -0.09486 -0.09518 -0.09504 ... 0.0001133 8.782e-05 6.622e-05 8.512e-05
Coordinates:
  * time           (time) float64 186kB 0.0 0.128 0.256 ... 2.974e+03 2.974e+03
  * chromo         (chromo) <U3 24B 'HbO' 'HbR'
    samples        (time) int64 186kB 0 1 2 3 4 ... 23235 23236 23237 23238
  * channel        (channel) object 160B 'S1D1' 'S1D2' 'S1D3' ... 'S8D7' 'S8D8'
    short_channel  (channel) object 160B 'S1D9' 'S1D9' ... 'S8D16' 'S8D16'
    comp_group     (channel) int64 160B 0 0 0 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 7 7
    source         (channel) object 160B 'S1' 'S1' 'S1' 'S2' ... 'S7' 'S8' 'S8'
    detector       (channel) object 160B 'D1' 'D2' 'D3' 'D1' ... 'D7' 'D7' 'D8'
Dimensions without coordinates: sample
[16]:
# Plot a band between mean-3*std and mean+3*std
# We select a 20 second window for better visualization
pred_mean = pred.mean("sample")
pred_std = pred.std("sample")

mm = pred_mean.loc[slice(60,80), "S5D5", "HbO"]
ss = pred_std.loc[slice(60,80), "S5D5", "HbO"]

p.plot(mm.time, mm, c="r")
p.fill_between(mm.time, mm-3*ss, mm+3*ss, fc="y", alpha=.8)

p.xlabel("time / s")
p.ylabel(r"$\Delta$  c / uM");
../../_images/examples_modeling_35_statsmodels_overview_31_0.png