GLM Illustrative Example

This notebok showcases cedalion’s GLM functionality. It creates a GLM design matrix and uses it to simulate a simple timeseries. It then fits different models to the simulated timeseries.

[1]:
import cedalion
import cedalion.io
import cedalion.nirs
import cedalion.dataclasses as cdc
import cedalion.models.glm as glm
import cedalion.plots as plots
import numpy as np
import xarray as xr
import matplotlib.pyplot as p
import pandas as pd
from cedalion import units
xr.set_options(display_expand_data=False);

Creating a simple simulated time series

1. Build a NDTimeSeries with noise

[2]:

fs = 10.0 * cedalion.units.Hz # sampling rate T = 240 * cedalion.units.s # time series length channel = ["S1D1", "S1D2"] # two channels chromo = ["HbO", "HbR"] # two chromophores nsample = int(T * fs) # number of samples # create a NDTimeSeries that contains normal distributed noise ts = cdc.build_timeseries( np.random.normal(0, 0.05, (nsample, len(channel), len(chromo))), dims=["time", "channel", "chromo"], time=np.arange(nsample) / fs, channel=channel, value_units=units.uM, time_units=units.s, other_coords={"chromo": chromo}, ) display(ts)
/home/runner/miniconda3/envs/cedalion/lib/python3.11/site-packages/xarray/core/indexes.py:469: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  index = pd.Index(np.asarray(array), **kwargs)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
[µM] 0.01394 -0.07415 -0.008105 0.007497 ... -0.00731 0.04531 -0.01772 0.03353
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
    samples  (time) int64 19kB 0 1 2 3 4 5 6 ... 2394 2395 2396 2397 2398 2399
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'

2. Build the Stimulus DataFrame

Specify two trial types: ‘StimA’, ‘StimB’ and define for each 3 trials with a duration of 10s.

The trials get different values assigned, which control the amplitude of the hemodynamic response.

The stimulus Dataframe needs the columns ‘trial_type’, ‘onset’, ‘duration’ and ‘value’.

[3]:
stim = pd.concat(
    (
        pd.DataFrame({"onset": o, "trial_type": "StimA"} for o in [10, 80, 150]),
        pd.DataFrame({"onset": o, "trial_type": "StimB"} for o in [45, 115, 185]),
    )
)

stim["value"] = [0.5, 1, 1.5, 1.25, 0.75, 1.0]
stim["duration"] = 10.
display(stim)
onset trial_type value duration
0 10 StimA 0.50 10.0
1 80 StimA 1.00 10.0
2 150 StimA 1.50 10.0
0 45 StimB 1.25 10.0
1 115 StimB 0.75 10.0
2 185 StimB 1.00 10.0

3. Build a Design Matrix

  • Cedalion provides the convenience function glm.make_design_matrix to specify model

  • two outputs:

    1. a design matrix that applies to all channels, with

    • HRF regressors

    • drift regressors

    • constant term

    1. a list of channel-wise regressors with

    • regressors that can differ between channels. E.g. for the short-distance channel regression one wants to choose for each long channel the content of a short channel.

The functional form of the HRF regressors is specified by the basis_function argument. Please refer to the notebook glm_basis_functions.ipynb and the sphinx documentation for more details.

[4]:
dms = (
    glm.design_matrix.hrf_regressors(
        ts, stim, glm.Gamma(tau=0 * units.s, sigma=5 * units.s, T=10 * units.s)
    )
    & glm.design_matrix.drift_regressors(ts, drift_order=0)
)

display(dms.common)
display(dms.channel_wise)

dm = dms.common
<xarray.DataArray (time: 2400, regressor: 3, chromo: 2)> Size: 115kB
0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0
Coordinates:
  * time       (time) float64 19kB 0.0 0.1 0.2 0.3 ... 239.6 239.7 239.8 239.9
  * regressor  (regressor) <U9 108B 'HRF StimA' 'HRF StimB' 'Drift 0'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
[]
[5]:
# For this use case we want the HbR regressors to be
# inverted and smaller in amplitude than their HbO counterparts.
dm.loc[:, ["HRF StimA", "HRF StimB"], "HbR"] *= -0.25
display(dm)
display('channel_wise_regressors:', dms.channel_wise)
<xarray.DataArray (time: 2400, regressor: 3, chromo: 2)> Size: 115kB
0.0 -0.0 0.0 -0.0 1.0 1.0 0.0 -0.0 0.0 ... 1.0 1.0 0.0 -0.0 0.0 -0.0 1.0 1.0
Coordinates:
  * time       (time) float64 19kB 0.0 0.1 0.2 0.3 ... 239.6 239.7 239.8 239.9
  * regressor  (regressor) <U9 108B 'HRF StimA' 'HRF StimB' 'Drift 0'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
'channel_wise_regressors:'
[]

The design matrix is a xr.DataArray with dimensions ‘time’, ‘chromo’ (or ‘wavelength’) and ‘regressor’. Each regressor has a string label for clarity. The convention used by make_design_matrix is to use labels of the form 'HRF <trial_typ> <number>' for the HRF regressors and 'Drift <number>' for the drift components.

Using such a schema is convenient when one needs to select regressors. If there would be multiple regressors for stimulus “StimA” one could distinguish all these from other HRF or drift regressors by selecting labels that start with ‘HRF StimA’.

[6]:
f, ax = p.subplots(1,2,figsize=(12,5))
dm.sel(chromo="HbO").plot(ax=ax[0], vmin=-1, vmax=1, cmap='RdBu_r')
dm.sel(chromo="HbR").plot(ax=ax[1], vmin=-1, vmax=1, cmap='RdBu_r')
p.xticks(rotation=90)
p.show()

f, ax = p.subplots(1,2,figsize=(12,3))
for i,chromo in enumerate(dm.chromo.values):
    for reg in dm.regressor.values:
        ax[i].plot(dm.time, dm.sel(chromo=chromo, regressor=reg), label=reg)
    plots.plot_stim_markers(ax[i], stim, y=1)
    ax[i].set_title(f"chromo={chromo}")
    ax[i].legend()

../../_images/examples_modeling_33_glm_illustrative_example_10_0.png
../../_images/examples_modeling_33_glm_illustrative_example_10_1.png

4. Add regressors to time series with noise

The time series has two channels: ‘S1D1’ and ‘S1D2’. In this example during trial ‘StimA’ activations should occur only in ‘S1D1’. During ‘StimB’ activations are only in the other channel.

The regressors are added with different offsets and scaling factors.

[7]:
# define offsets and scaling factors
SCALE_STIMA = 1.25
OFFSET_STIMA = 0.5
SCALE_STIMB = 0.75
OFFSET_STIMB = 0.25

# add scaled regressor and offsets to time series, which up to now contains only noise
ts.loc[:, "S1D1", :] += (
    SCALE_STIMA * dm.sel(regressor="HRF StimA").pint.quantify("uM")
    + OFFSET_STIMA * cedalion.units.uM
)
ts.loc[:, "S1D2", :] += (
    SCALE_STIMB * dm.sel(regressor="HRF StimB").pint.quantify("uM")
    + OFFSET_STIMB * cedalion.units.uM
)

# plot original regressors for StimA and StimB
f, ax = p.subplots(1, 2, sharex=True, sharey=True, figsize=(12,3))
for i, reg in enumerate(["HRF StimA", "HRF StimB"]):
    ax[i].plot(dm.time, dm.sel(regressor=reg, chromo="HbO"), "r-")
    ax[i].plot(dm.time, dm.sel(regressor=reg, chromo="HbR"), "b-")
    ax[i].set_title(f"Reg {reg}")
    plots.plot_stim_markers(ax[i], stim, y=1)
    ax[i].grid(True)
p.tight_layout()

# plot the resulting time series
f, ax = p.subplots(1, 2, sharex=True, sharey=True, figsize=(12,3))
for i, ch in enumerate(ts.channel.values):
    ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbO"), "r-")
    ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbR"), "b-")
    ax[i].set_title(f"Ch {ch}")
    ax[i].grid(True)
    plots.plot_stim_markers(ax[i], stim, y=1)
p.tight_layout()
../../_images/examples_modeling_33_glm_illustrative_example_12_0.png
../../_images/examples_modeling_33_glm_illustrative_example_12_1.png

Fitting the GLM - using the same design matrix

The method glm.fit is used to fit the GLM to the time series.

  • only ‘Ordinary Least Squares’ (ols) is currently implemented

  • more realistic noise models, AR-ILS not available, yet

  • no stats, uncertainties

  • the returned coefficients / betas are stored again in an xr.DataArray

[8]:
ts.isel(time=0)
[8]:
<xarray.DataArray (channel: 2, chromo: 2)> Size: 32B
[µM] 0.5139 0.4259 0.2419 0.2575
Coordinates:
    time     float64 8B 0.0
    samples  int64 8B 0
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
[9]:
result = glm.fit(ts, dms, noise_model="ols")
display(result)
 50%|█████     | 1/2 [00:00<00:00, 85.24it/s]
 50%|█████     | 1/2 [00:00<00:00, 85.54it/s]
<xarray.DataArray (channel: 2, chromo: 2)> Size: 32B
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7f1...
Coordinates:
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
Attributes:
    description:  OLS model via statsmodels.regression
[10]:
betas = result.sm.params
betas
[10]:
<xarray.DataArray (channel: 2, chromo: 2, regressor: 3)> Size: 96B
1.256 0.0007078 0.4993 1.252 -0.01869 ... 0.7484 0.2508 0.02455 0.744 0.2509
Coordinates:
  * regressor  (regressor) object 24B 'HRF StimA' 'HRF StimB' 'Drift 0'
  * channel    (channel) <U4 32B 'S1D1' 'S1D2'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
Attributes:
    description:  OLS model via statsmodels.regression

Translate the DataArray into a DataFrame to get a rendered table. Here, the scaling factors and offsets are added as an additional column as these are the expected values for the coefficients.

[11]:
df = betas.rename("betas_S1D1").to_dataframe()
# add a column with expected values
df["expected"] = [
    SCALE_STIMA, SCALE_STIMA,
    0.0, 0.0,
    0.0, 0.0,
    SCALE_STIMB, SCALE_STIMB,
    OFFSET_STIMA, OFFSET_STIMA,
    OFFSET_STIMB, OFFSET_STIMB,
]
display(df)
betas_S1D1 expected
channel chromo regressor
S1D1 HbO HRF StimA 1.255802 1.25
HRF StimB 0.000708 1.25
Drift 0 0.499258 0.00
HbR HRF StimA 1.251502 0.00
HRF StimB -0.018693 0.00
Drift 0 0.501161 0.00
S1D2 HbO HRF StimA -0.002769 0.75
HRF StimB 0.748437 0.75
Drift 0 0.250823 0.50
HbR HRF StimA 0.024548 0.50
HRF StimB 0.744023 0.25
Drift 0 0.250920 0.25
[12]:
# helper function to compare original time series and model prediction
def plot_data_to_fit_comparison(ts, pred, stim):
    f, ax = p.subplots(2,1, sharex=True, figsize=(12,4))
    for i, ch in enumerate(ts.channel.values):
        ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbO"), "r-")
        ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbR"), "b-")
        ax[i].plot(pred.time, pred.sel(channel=ch, chromo="HbO"), "-", c="#e41a1c", lw=2)
        ax[i].plot(pred.time, pred.sel(channel=ch, chromo="HbR"), "-", c="#377eb8", lw=2)
        ax[i].set_title(f"Ch {ch}")
        plots.plot_stim_markers(ax[i], stim, y=1)
    p.tight_layout()
[13]:
# use all regressors of the design matrix to predict the time series
pred = glm.predict(ts, betas, dms)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim)


# use only HRF-related regressors, i.e. remove the drift/offset
pred = glm.predict(
    ts,
    # select regressor whose label start with HRF Stim
    betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
    dms
)
plot_data_to_fit_comparison(ts, pred, stim)

<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
0.4993 0.5012 0.2508 0.2509 0.4993 0.5012 ... 0.2509 0.4993 0.5012 0.2508 0.2509
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
../../_images/examples_modeling_33_glm_illustrative_example_20_1.png
../../_images/examples_modeling_33_glm_illustrative_example_20_2.png

Fitting the GLM - this time using a slightly different model

[14]:
# copy the stimulus DataFrame and set all values to 1, i.e.
# there is no prior knowledge about amplitude differences between trials
stim_other = stim.copy()
stim_other["value"] = 1.
display(stim_other)

# this design matrix also uses Gamma basis functions but
# the onset (tau) is delayed and the HRF width (sigma) is longer.
dms_other = (
    glm.design_matrix.hrf_regressors(
        ts, stim, glm.Gamma(tau=1 * units.s, sigma=7 * units.s, T=10 * units.s)
    )
    & glm.design_matrix.drift_regressors(ts, drift_order=0)
)


betas = glm.fit(ts, dms_other, noise_model="ols").sm.params

# display the fitted betas as a DataFrame
display(betas.rename("betas_S1D1").to_dataframe())
onset trial_type value duration
0 10 StimA 1.0 10.0
1 80 StimA 1.0 10.0
2 150 StimA 1.0 10.0
0 45 StimB 1.0 10.0
1 115 StimB 1.0 10.0
2 185 StimB 1.0 10.0
 50%|█████     | 1/2 [00:00<00:00, 86.60it/s]
 50%|█████     | 1/2 [00:00<00:00, 85.18it/s]
betas_S1D1
channel chromo regressor
S1D1 HbO HRF StimA 1.073087
HRF StimB -0.014199
Drift 0 0.508737
HbR HRF StimA -0.269164
HRF StimB 0.008863
Drift 0 0.498878
S1D2 HbO HRF StimA -0.015452
HRF StimB 0.638140
Drift 0 0.257816
HbR HRF StimA -0.006134
HRF StimB -0.158854
Drift 0 0.249647
[15]:
pred = glm.predict(ts, betas, dms_other)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim_other)


pred = glm.predict(
    ts,
    betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
    dms_other
)
plot_data_to_fit_comparison(ts, pred, stim_other)

<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
0.5087 0.4989 0.2578 0.2496 0.5087 0.4989 ... 0.2496 0.5087 0.4989 0.2578 0.2496
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
../../_images/examples_modeling_33_glm_illustrative_example_23_1.png
../../_images/examples_modeling_33_glm_illustrative_example_23_2.png

Fitting with multiple gaussian kernels

[16]:
dms_other = glm.design_matrix.hrf_regressors(
    ts,
    stim,
    glm.GaussianKernels(
        t_pre=5 * units.s, t_post=30 * units.s, t_delta=3 * units.s, t_std=2 * units.s
    ),
) & glm.design_matrix.drift_regressors(ts, drift_order=0)

dm_other = dms_other.common

betas = glm.fit(ts, dms_other, noise_model="ols").sm.params

f,ax = p.subplots(1,1, figsize=(12,4))
for reg in dm_other.regressor.values:
    p.plot(dm_other.time, dm_other.sel(chromo="HbO", regressor=reg), label=reg)
plots.plot_stim_markers(ax, stim_other, y=1.)
p.legend(ncol=3, loc="center right")
p.xlim(0,90)
 50%|█████     | 1/2 [00:00<00:00, 86.03it/s]
 50%|█████     | 1/2 [00:00<00:00, 84.84it/s]
[16]:
(0.0, 90.0)
../../_images/examples_modeling_33_glm_illustrative_example_25_2.png
[17]:
# translate the xr.DataArray into a pd.DataFrame which are displayed as tables
display(betas.rename("betas_S1D1").to_dataframe())
betas_S1D1
channel chromo regressor
S1D1 HbO HRF StimA 00 -0.000962
HRF StimA 01 -0.014231
HRF StimA 02 0.014054
HRF StimA 03 -0.014099
HRF StimA 04 0.054141
... ... ... ...
S1D2 HbR HRF StimB 07 -0.072202
HRF StimB 08 -0.093719
HRF StimB 09 -0.024040
HRF StimB 10 0.002919
Drift 0 0.251597

92 rows × 1 columns

[18]:
pred = glm.predict(ts, betas, dms_other)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim_other)


pred = glm.predict(
    ts,
    betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
    dms_other,
)
plot_data_to_fit_comparison(ts, pred, stim_other)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
0.4999 0.499 0.2503 0.2516 0.4999 0.499 ... 0.2516 0.4999 0.499 0.2503 0.2516
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
../../_images/examples_modeling_33_glm_illustrative_example_27_1.png
../../_images/examples_modeling_33_glm_illustrative_example_27_2.png