GLM Illustrative Example

This notebok showcases cedalion’s GLM functionality. It creates a GLM design matrix and uses it to simulate a simple timeseries. It then fits different models to the simulated timeseries.

[1]:
import cedalion
import cedalion.io
import cedalion.nirs
import cedalion.dataclasses as cdc
import cedalion.models.glm as glm
import cedalion.plots as plots
import numpy as np
import xarray as xr
from pathlib import Path
import matplotlib.pyplot as p
import pandas as pd
from cedalion import units
xr.set_options(display_expand_data=False);

Creating a simple simulated time series

1. Build a NDTimeSeries with noise

[2]:

fs = 10.0 * cedalion.units.Hz # sampling rate T = 240 * cedalion.units.s # time series length channel = ["S1D1", "S1D2"] # two channels chromo = ["HbO", "HbR"] # two chromophores nsample = int(T * fs) # number of samples # create a NDTimeSeries that contains normal distributed noise ts = cdc.build_timeseries( np.random.normal(0, 0.05, (nsample, len(channel), len(chromo))), dims=["time", "channel", "chromo"], time=np.arange(nsample) / fs, channel=channel, value_units=units.uM, time_units=units.s, other_coords={"chromo": chromo}, ) display(ts)
/home/runner/miniconda3/envs/cedalion/lib/python3.11/site-packages/xarray/core/indexes.py:469: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  index = pd.Index(np.asarray(array), **kwargs)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
[µM] -0.02982 0.01249 -0.08366 0.01252 ... -0.05243 -0.0007755 -0.01382 0.004107
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
    samples  (time) int64 19kB 0 1 2 3 4 5 6 ... 2394 2395 2396 2397 2398 2399
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'

2. Build the Stimulus DataFrame

Specify two trial types: ‘StimA’, ‘StimB’ and define for each 3 trials with a duration of 10s.

The trials get different values assigned, which control the amplitude of the hemodynamic response.

The stimulus Dataframe needs the columns ‘trial_type’, ‘onset’, ‘duration’ and ‘value’.

[3]:
stim = pd.concat(
    (
        pd.DataFrame({"onset": o, "trial_type": "StimA"} for o in [10, 80, 150]),
        pd.DataFrame({"onset": o, "trial_type": "StimB"} for o in [45, 115, 185]),
    )
)

stim["value"] = [0.5, 1, 1.5, 1.25, 0.75, 1.0]
stim["duration"] = 10.
display(stim)
onset trial_type value duration
0 10 StimA 0.50 10.0
1 80 StimA 1.00 10.0
2 150 StimA 1.50 10.0
0 45 StimB 1.25 10.0
1 115 StimB 0.75 10.0
2 185 StimB 1.00 10.0

3. Build a Design Matrix

  • Cedalion provides the convenience function glm.make_design_matrix to specify model

  • two outputs:

    1. a design matrix that applies to all channels, with

    • HRF regressors

    • drift regressors

    • constant term

    1. a list of channel-wise regressors with

    • regressors that can differ between channels. E.g. for the short-distance channel regression one wants to choose for each long channel the content of a short channel.

The functional form of the HRF regressors is specified by the basis_function argument. Please refer to the notebook glm_basis_functions.ipynb and the sphinx documentation for more details.

[4]:
dm, channel_wise_regressors = glm.make_design_matrix(
    ts_long=ts,
    ts_short=None,
    stim=stim,
    geo3d=None,
    basis_function=glm.Gamma(tau=0 * units.s, sigma=5 * units.s, T=10 * units.s),
    drift_order=0,
    short_channel_method=None,
)
# For this use case we want the HbR regressors to be
# inverted and smaller in amplitude than their HbO counterparts.
dm.loc[:, ["HRF StimA", "HRF StimB"], "HbR"] *= -0.25
display(dm)
display('channel_wise_regressors:', channel_wise_regressors)
<xarray.DataArray (time: 2400, regressor: 3, chromo: 2)> Size: 115kB
0.0 -0.0 0.0 -0.0 1.0 1.0 0.0 -0.0 0.0 ... 1.0 1.0 0.0 -0.0 0.0 -0.0 1.0 1.0
Coordinates:
  * time       (time) float64 19kB 0.0 0.1 0.2 0.3 ... 239.6 239.7 239.8 239.9
  * regressor  (regressor) <U9 108B 'HRF StimA' 'HRF StimB' 'Drift 0'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'
'channel_wise_regressors:'
None

The design matrix is a xr.DataArray with dimensions ‘time’, ‘chromo’ (or ‘wavelength’) and ‘regressor’. Each regressor has a string label for clarity. The convention used by make_design_matrix is to use labels of the form 'HRF <trial_typ> <number>' for the HRF regressors and 'Drift <number>' for the drift components.

Using such a schema is convenient when one needs to select regressors. If there would be multiple regressors for stimulus “StimA” one could distinguish all these from other HRF or drift regressors by selecting labels that start with ‘HRF StimA’.

[5]:
f, ax = p.subplots(1,2,figsize=(12,5))
dm.sel(chromo="HbO").plot(ax=ax[0], vmin=-1, vmax=1, cmap='RdBu_r')
dm.sel(chromo="HbR").plot(ax=ax[1], vmin=-1, vmax=1, cmap='RdBu_r')
p.xticks(rotation=90)
p.show()

f, ax = p.subplots(1,2,figsize=(12,3))
for i,chromo in enumerate(dm.chromo.values):
    for reg in dm.regressor.values:
        ax[i].plot(dm.time, dm.sel(chromo=chromo, regressor=reg), label=reg)
    plots.plot_stim_markers(ax[i], stim, y=1)
    ax[i].set_title(f"chromo={chromo}")
    ax[i].legend()

../../_images/examples_modeling_33_glm_illustrative_example_9_0.png
../../_images/examples_modeling_33_glm_illustrative_example_9_1.png

4. Add regressors to time series with noise

The time series has two channels: ‘S1D1’ and ‘S1D2’. In this example during trial ‘StimA’ activations should occur only in ‘S1D1’. During ‘StimB’ activations are only in the other channel.

The regressors are added with different offsets and scaling factors.

[6]:
# define offsets and scaling factors
SCALE_STIMA = 1.25
OFFSET_STIMA = 0.5
SCALE_STIMB = 0.75
OFFSET_STIMB = 0.25

# add scaled regressor and offsets to time series, which up to now contains only noise
ts.loc[:, "S1D1", :] += (
    SCALE_STIMA * dm.sel(regressor="HRF StimA").pint.quantify("uM")
    + OFFSET_STIMA * cedalion.units.uM
)
ts.loc[:, "S1D2", :] += (
    SCALE_STIMB * dm.sel(regressor="HRF StimB").pint.quantify("uM")
    + OFFSET_STIMB * cedalion.units.uM
)

# plot original regressors for StimA and StimB
f, ax = p.subplots(1, 2, sharex=True, sharey=True, figsize=(12,3))
for i, reg in enumerate(["HRF StimA", "HRF StimB"]):
    ax[i].plot(dm.time, dm.sel(regressor=reg, chromo="HbO"), "r-")
    ax[i].plot(dm.time, dm.sel(regressor=reg, chromo="HbR"), "b-")
    ax[i].set_title(f"Reg {reg}")
    plots.plot_stim_markers(ax[i], stim, y=1)
    ax[i].grid(True)
p.tight_layout()

# plot the resulting time series
f, ax = p.subplots(1, 2, sharex=True, sharey=True, figsize=(12,3))
for i, ch in enumerate(ts.channel.values):
    ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbO"), "r-")
    ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbR"), "b-")
    ax[i].set_title(f"Ch {ch}")
    ax[i].grid(True)
    plots.plot_stim_markers(ax[i], stim, y=1)
p.tight_layout()
../../_images/examples_modeling_33_glm_illustrative_example_11_0.png
../../_images/examples_modeling_33_glm_illustrative_example_11_1.png

Fitting the GLM - using the same design matrix

The method glm.fit is used to fit the GLM to the time series.

  • only ‘Ordinary Least Squares’ (ols) is currently implemented

  • more realistic noise models, AR-ILS not available, yet

  • no stats, uncertainties

  • the returned coefficients / betas are stored again in an xr.DataArray

[7]:
betas = glm.fit(ts, dm, channel_wise_regressors, noise_model="ols")
display(betas)
<xarray.DataArray (regressor: 3, channel: 2, chromo: 2)> Size: 96B
1.255 1.255 -0.001301 0.03587 -0.01106 ... 0.7442 0.5005 0.5016 0.2505 0.2512
Coordinates:
  * regressor  (regressor) <U9 108B 'HRF StimA' 'HRF StimB' 'Drift 0'
  * channel    (channel) <U4 32B 'S1D1' 'S1D2'
  * chromo     (chromo) <U3 24B 'HbO' 'HbR'

Translate the DataArray into a DataFrame to get a rendered table. Here, the scaling factors and offsets are added as an additional column as these are the expected values for the coefficients.

[8]:
df = betas.rename("betas_S1D1").to_dataframe()
# add a column with expected values
df["expected"] = [
    SCALE_STIMA, SCALE_STIMA,
    0.0, 0.0,
    0.0, 0.0,
    SCALE_STIMB, SCALE_STIMB,
    OFFSET_STIMA, OFFSET_STIMA,
    OFFSET_STIMB, OFFSET_STIMB,
]
display(df)
betas_S1D1 expected
regressor channel chromo
HRF StimA S1D1 HbO 1.255031 1.25
HbR 1.254819 1.25
S1D2 HbO -0.001301 0.00
HbR 0.035873 0.00
HRF StimB S1D1 HbO -0.011055 0.00
HbR 0.003103 0.00
S1D2 HbO 0.748614 0.75
HbR 0.744215 0.75
Drift 0 S1D1 HbO 0.500517 0.50
HbR 0.501592 0.50
S1D2 HbO 0.250505 0.25
HbR 0.251182 0.25
[9]:
# helper function to compare original time series and model prediction
def plot_data_to_fit_comparison(ts, pred, stim):
    f, ax = p.subplots(2,1, sharex=True, figsize=(12,4))
    for i, ch in enumerate(ts.channel.values):
        ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbO"), "r-")
        ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbR"), "b-")
        ax[i].plot(pred.time, pred.sel(channel=ch, chromo="HbO"), "-", c="#e41a1c", lw=2)
        ax[i].plot(pred.time, pred.sel(channel=ch, chromo="HbR"), "-", c="#377eb8", lw=2)
        ax[i].set_title(f"Ch {ch}")
        plots.plot_stim_markers(ax[i], stim, y=1)
    p.tight_layout()
[10]:
# use all regressors of the design matrix to predict the time series
pred = glm.predict(ts, betas, dm, channel_wise_regressors)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim)


# use only HRF-related regressors, i.e. remove the drift/offset
pred = glm.predict(
    ts,
    # select regressor whose label start with HRF Stim
    betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
    dm,
    channel_wise_regressors,
)
plot_data_to_fit_comparison(ts, pred, stim)

<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
0.5005 0.5016 0.2505 0.2512 0.5005 0.5016 ... 0.2512 0.5005 0.5016 0.2505 0.2512
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
../../_images/examples_modeling_33_glm_illustrative_example_17_1.png
../../_images/examples_modeling_33_glm_illustrative_example_17_2.png

Fitting the GLM - this time using a slightly different model

[11]:
# copy the stimulus DataFrame and set all values to 1, i.e.
# there is no prior knowledge about amplitude differences between trials
stim_other = stim.copy()
stim_other["value"] = 1.
display(stim_other)

# this design matrix also uses Gamma basis functions but
# the onset (tau) is delayed and the HRF width (sigma) is longer.
dm_other, channel_wise_regressors_other = glm.make_design_matrix(
    ts,
    None,
    stim_other,
    None,
    basis_function=glm.Gamma(tau=1 * units.s, sigma=7 * units.s, T=10 * units.s),
    drift_order=0,
    short_channel_method=None,
)


betas = glm.fit(ts, dm_other, channel_wise_regressors_other, noise_model="ols")

# display the fitted betas as a DataFrame
display(betas.rename("betas_S1D1").to_dataframe())
onset trial_type value duration
0 10 StimA 1.0 10.0
1 80 StimA 1.0 10.0
2 150 StimA 1.0 10.0
0 45 StimB 1.0 10.0
1 115 StimB 1.0 10.0
2 185 StimB 1.0 10.0
betas_S1D1
regressor channel chromo
HRF StimA S1D1 HbO 0.711062
HbR -0.175408
S1D2 HbO -0.008824
HbR -0.004429
HRF StimB S1D1 HbO -0.019069
HbR 0.006574
S1D2 HbO 0.510643
HbR -0.125420
Drift 0 S1D1 HbO 0.510514
HbR 0.497951
S1D2 HbO 0.257398
HbR 0.249386
[12]:
pred = glm.predict(ts, betas, dm_other, channel_wise_regressors_other)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim_other)


pred = glm.predict(
    ts,
    betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
    dm_other,
    channel_wise_regressors_other,
)
plot_data_to_fit_comparison(ts, pred, stim_other)

<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
0.5105 0.498 0.2574 0.2494 0.5105 0.498 ... 0.2494 0.5105 0.498 0.2574 0.2494
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
../../_images/examples_modeling_33_glm_illustrative_example_20_1.png
../../_images/examples_modeling_33_glm_illustrative_example_20_2.png

Fitting with multiple gaussian kernels

[13]:
dm_other, channel_wise_regressors_other = glm.make_design_matrix(
    ts,
    None,
    stim_other,
    None,
    basis_function=glm.GaussianKernels(
        t_pre=5 * units.s, t_post=30 * units.s, t_delta=3 * units.s, t_std=2 * units.s
    ),
    drift_order=0,
    short_channel_method=None,
)

betas = glm.fit(ts, dm_other, channel_wise_regressors_other, noise_model="ols")

f,ax = p.subplots(1,1, figsize=(12,4))
for reg in dm_other.regressor.values:
    p.plot(dm_other.time, dm_other.sel(chromo="HbO", regressor=reg), label=reg)
plots.plot_stim_markers(ax, stim_other, y=1.)
p.legend(ncol=3, loc="center right")
p.xlim(0,90)
[13]:
(0.0, 90.0)
../../_images/examples_modeling_33_glm_illustrative_example_22_1.png
[14]:
# translate the xr.DataArray into a pd.DataFrame which are displayed as tables
display(betas.rename("betas_S1D1").to_dataframe())
betas_S1D1
regressor channel chromo
HRF StimA 00 S1D1 HbO -0.019340
HbR 0.001062
S1D2 HbO -0.016493
HbR -0.009073
HRF StimA 01 S1D1 HbO 0.009910
... ... ... ...
HRF StimB 10 S1D2 HbR 0.015613
Drift 0 S1D1 HbO 0.499179
HbR 0.500876
S1D2 HbO 0.244598
HbR 0.250009

92 rows × 1 columns

[15]:
pred = glm.predict(ts, betas, dm_other, channel_wise_regressors_other)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim_other)


pred = glm.predict(
    ts,
    betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
    dm_other,
    channel_wise_regressors_other,
)
plot_data_to_fit_comparison(ts, pred, stim_other)

<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB
0.4992 0.5009 0.2446 0.25 0.4992 0.5009 ... 0.25 0.4992 0.5009 0.2446 0.25
Coordinates:
  * time     (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9
  * chromo   (chromo) <U3 24B 'HbO' 'HbR'
  * channel  (channel) <U4 32B 'S1D1' 'S1D2'
../../_images/examples_modeling_33_glm_illustrative_example_24_1.png
../../_images/examples_modeling_33_glm_illustrative_example_24_2.png