GLM Illustrative Example
This notebok showcases cedalion’s GLM functionality. It creates a GLM design matrix and uses it to simulate a simple timeseries. It then fits different models to the simulated timeseries.
[1]:
import cedalion
import cedalion.io
import cedalion.nirs
import cedalion.dataclasses as cdc
import cedalion.models.glm as glm
import cedalion.plots as plots
import numpy as np
import xarray as xr
from pathlib import Path
import matplotlib.pyplot as p
import pandas as pd
from cedalion import units
xr.set_options(display_expand_data=False);
Creating a simple simulated time series
1. Build a NDTimeSeries with noise
[2]:
fs = 10.0 * cedalion.units.Hz # sampling rate
T = 240 * cedalion.units.s # time series length
channel = ["S1D1", "S1D2"] # two channels
chromo = ["HbO", "HbR"] # two chromophores
nsample = int(T * fs) # number of samples
# create a NDTimeSeries that contains normal distributed noise
ts = cdc.build_timeseries(
np.random.normal(0, 0.05, (nsample, len(channel), len(chromo))),
dims=["time", "channel", "chromo"],
time=np.arange(nsample) / fs,
channel=channel,
value_units=units.uM,
time_units=units.s,
other_coords={"chromo": chromo},
)
display(ts)
/home/runner/miniconda3/envs/cedalion/lib/python3.11/site-packages/xarray/core/indexes.py:469: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
index = pd.Index(np.asarray(array), **kwargs)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB [µM] -0.02982 0.01249 -0.08366 0.01252 ... -0.05243 -0.0007755 -0.01382 0.004107 Coordinates: * time (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9 samples (time) int64 19kB 0 1 2 3 4 5 6 ... 2394 2395 2396 2397 2398 2399 * channel (channel) <U4 32B 'S1D1' 'S1D2' * chromo (chromo) <U3 24B 'HbO' 'HbR'
2. Build the Stimulus DataFrame
Specify two trial types: ‘StimA’, ‘StimB’ and define for each 3 trials with a duration of 10s.
The trials get different values assigned, which control the amplitude of the hemodynamic response.
The stimulus Dataframe needs the columns ‘trial_type’, ‘onset’, ‘duration’ and ‘value’.
[3]:
stim = pd.concat(
(
pd.DataFrame({"onset": o, "trial_type": "StimA"} for o in [10, 80, 150]),
pd.DataFrame({"onset": o, "trial_type": "StimB"} for o in [45, 115, 185]),
)
)
stim["value"] = [0.5, 1, 1.5, 1.25, 0.75, 1.0]
stim["duration"] = 10.
display(stim)
onset | trial_type | value | duration | |
---|---|---|---|---|
0 | 10 | StimA | 0.50 | 10.0 |
1 | 80 | StimA | 1.00 | 10.0 |
2 | 150 | StimA | 1.50 | 10.0 |
0 | 45 | StimB | 1.25 | 10.0 |
1 | 115 | StimB | 0.75 | 10.0 |
2 | 185 | StimB | 1.00 | 10.0 |
3. Build a Design Matrix
Cedalion provides the convenience function
glm.make_design_matrix
to specify modeltwo outputs:
a design matrix that applies to all channels, with
HRF regressors
drift regressors
constant term
a list of channel-wise regressors with
regressors that can differ between channels. E.g. for the short-distance channel regression one wants to choose for each long channel the content of a short channel.
The functional form of the HRF regressors is specified by the basis_function
argument. Please refer to the notebook glm_basis_functions.ipynb
and the sphinx documentation for more details.
[4]:
dm, channel_wise_regressors = glm.make_design_matrix(
ts_long=ts,
ts_short=None,
stim=stim,
geo3d=None,
basis_function=glm.Gamma(tau=0 * units.s, sigma=5 * units.s, T=10 * units.s),
drift_order=0,
short_channel_method=None,
)
# For this use case we want the HbR regressors to be
# inverted and smaller in amplitude than their HbO counterparts.
dm.loc[:, ["HRF StimA", "HRF StimB"], "HbR"] *= -0.25
display(dm)
display('channel_wise_regressors:', channel_wise_regressors)
<xarray.DataArray (time: 2400, regressor: 3, chromo: 2)> Size: 115kB 0.0 -0.0 0.0 -0.0 1.0 1.0 0.0 -0.0 0.0 ... 1.0 1.0 0.0 -0.0 0.0 -0.0 1.0 1.0 Coordinates: * time (time) float64 19kB 0.0 0.1 0.2 0.3 ... 239.6 239.7 239.8 239.9 * regressor (regressor) <U9 108B 'HRF StimA' 'HRF StimB' 'Drift 0' * chromo (chromo) <U3 24B 'HbO' 'HbR'
'channel_wise_regressors:'
None
The design matrix is a xr.DataArray
with dimensions ‘time’, ‘chromo’ (or ‘wavelength’) and ‘regressor’. Each regressor has a string label for clarity. The convention used by make_design_matrix
is to use labels of the form 'HRF <trial_typ> <number>'
for the HRF regressors and 'Drift <number>'
for the drift components.
Using such a schema is convenient when one needs to select regressors. If there would be multiple regressors for stimulus “StimA” one could distinguish all these from other HRF or drift regressors by selecting labels that start with ‘HRF StimA’.
[5]:
f, ax = p.subplots(1,2,figsize=(12,5))
dm.sel(chromo="HbO").plot(ax=ax[0], vmin=-1, vmax=1, cmap='RdBu_r')
dm.sel(chromo="HbR").plot(ax=ax[1], vmin=-1, vmax=1, cmap='RdBu_r')
p.xticks(rotation=90)
p.show()
f, ax = p.subplots(1,2,figsize=(12,3))
for i,chromo in enumerate(dm.chromo.values):
for reg in dm.regressor.values:
ax[i].plot(dm.time, dm.sel(chromo=chromo, regressor=reg), label=reg)
plots.plot_stim_markers(ax[i], stim, y=1)
ax[i].set_title(f"chromo={chromo}")
ax[i].legend()


4. Add regressors to time series with noise
The time series has two channels: ‘S1D1’ and ‘S1D2’. In this example during trial ‘StimA’ activations should occur only in ‘S1D1’. During ‘StimB’ activations are only in the other channel.
The regressors are added with different offsets and scaling factors.
[6]:
# define offsets and scaling factors
SCALE_STIMA = 1.25
OFFSET_STIMA = 0.5
SCALE_STIMB = 0.75
OFFSET_STIMB = 0.25
# add scaled regressor and offsets to time series, which up to now contains only noise
ts.loc[:, "S1D1", :] += (
SCALE_STIMA * dm.sel(regressor="HRF StimA").pint.quantify("uM")
+ OFFSET_STIMA * cedalion.units.uM
)
ts.loc[:, "S1D2", :] += (
SCALE_STIMB * dm.sel(regressor="HRF StimB").pint.quantify("uM")
+ OFFSET_STIMB * cedalion.units.uM
)
# plot original regressors for StimA and StimB
f, ax = p.subplots(1, 2, sharex=True, sharey=True, figsize=(12,3))
for i, reg in enumerate(["HRF StimA", "HRF StimB"]):
ax[i].plot(dm.time, dm.sel(regressor=reg, chromo="HbO"), "r-")
ax[i].plot(dm.time, dm.sel(regressor=reg, chromo="HbR"), "b-")
ax[i].set_title(f"Reg {reg}")
plots.plot_stim_markers(ax[i], stim, y=1)
ax[i].grid(True)
p.tight_layout()
# plot the resulting time series
f, ax = p.subplots(1, 2, sharex=True, sharey=True, figsize=(12,3))
for i, ch in enumerate(ts.channel.values):
ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbO"), "r-")
ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbR"), "b-")
ax[i].set_title(f"Ch {ch}")
ax[i].grid(True)
plots.plot_stim_markers(ax[i], stim, y=1)
p.tight_layout()


Fitting the GLM - using the same design matrix
The method glm.fit
is used to fit the GLM to the time series.
only ‘Ordinary Least Squares’ (ols) is currently implemented
more realistic noise models, AR-ILS not available, yet
no stats, uncertainties
the returned coefficients / betas are stored again in an xr.DataArray
[7]:
betas = glm.fit(ts, dm, channel_wise_regressors, noise_model="ols")
display(betas)
<xarray.DataArray (regressor: 3, channel: 2, chromo: 2)> Size: 96B 1.255 1.255 -0.001301 0.03587 -0.01106 ... 0.7442 0.5005 0.5016 0.2505 0.2512 Coordinates: * regressor (regressor) <U9 108B 'HRF StimA' 'HRF StimB' 'Drift 0' * channel (channel) <U4 32B 'S1D1' 'S1D2' * chromo (chromo) <U3 24B 'HbO' 'HbR'
Translate the DataArray into a DataFrame to get a rendered table. Here, the scaling factors and offsets are added as an additional column as these are the expected values for the coefficients.
[8]:
df = betas.rename("betas_S1D1").to_dataframe()
# add a column with expected values
df["expected"] = [
SCALE_STIMA, SCALE_STIMA,
0.0, 0.0,
0.0, 0.0,
SCALE_STIMB, SCALE_STIMB,
OFFSET_STIMA, OFFSET_STIMA,
OFFSET_STIMB, OFFSET_STIMB,
]
display(df)
betas_S1D1 | expected | |||
---|---|---|---|---|
regressor | channel | chromo | ||
HRF StimA | S1D1 | HbO | 1.255031 | 1.25 |
HbR | 1.254819 | 1.25 | ||
S1D2 | HbO | -0.001301 | 0.00 | |
HbR | 0.035873 | 0.00 | ||
HRF StimB | S1D1 | HbO | -0.011055 | 0.00 |
HbR | 0.003103 | 0.00 | ||
S1D2 | HbO | 0.748614 | 0.75 | |
HbR | 0.744215 | 0.75 | ||
Drift 0 | S1D1 | HbO | 0.500517 | 0.50 |
HbR | 0.501592 | 0.50 | ||
S1D2 | HbO | 0.250505 | 0.25 | |
HbR | 0.251182 | 0.25 |
[9]:
# helper function to compare original time series and model prediction
def plot_data_to_fit_comparison(ts, pred, stim):
f, ax = p.subplots(2,1, sharex=True, figsize=(12,4))
for i, ch in enumerate(ts.channel.values):
ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbO"), "r-")
ax[i].plot(ts.time, ts.sel(channel=ch, chromo="HbR"), "b-")
ax[i].plot(pred.time, pred.sel(channel=ch, chromo="HbO"), "-", c="#e41a1c", lw=2)
ax[i].plot(pred.time, pred.sel(channel=ch, chromo="HbR"), "-", c="#377eb8", lw=2)
ax[i].set_title(f"Ch {ch}")
plots.plot_stim_markers(ax[i], stim, y=1)
p.tight_layout()
[10]:
# use all regressors of the design matrix to predict the time series
pred = glm.predict(ts, betas, dm, channel_wise_regressors)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim)
# use only HRF-related regressors, i.e. remove the drift/offset
pred = glm.predict(
ts,
# select regressor whose label start with HRF Stim
betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
dm,
channel_wise_regressors,
)
plot_data_to_fit_comparison(ts, pred, stim)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB 0.5005 0.5016 0.2505 0.2512 0.5005 0.5016 ... 0.2512 0.5005 0.5016 0.2505 0.2512 Coordinates: * time (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9 * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) <U4 32B 'S1D1' 'S1D2'


Fitting the GLM - this time using a slightly different model
[11]:
# copy the stimulus DataFrame and set all values to 1, i.e.
# there is no prior knowledge about amplitude differences between trials
stim_other = stim.copy()
stim_other["value"] = 1.
display(stim_other)
# this design matrix also uses Gamma basis functions but
# the onset (tau) is delayed and the HRF width (sigma) is longer.
dm_other, channel_wise_regressors_other = glm.make_design_matrix(
ts,
None,
stim_other,
None,
basis_function=glm.Gamma(tau=1 * units.s, sigma=7 * units.s, T=10 * units.s),
drift_order=0,
short_channel_method=None,
)
betas = glm.fit(ts, dm_other, channel_wise_regressors_other, noise_model="ols")
# display the fitted betas as a DataFrame
display(betas.rename("betas_S1D1").to_dataframe())
onset | trial_type | value | duration | |
---|---|---|---|---|
0 | 10 | StimA | 1.0 | 10.0 |
1 | 80 | StimA | 1.0 | 10.0 |
2 | 150 | StimA | 1.0 | 10.0 |
0 | 45 | StimB | 1.0 | 10.0 |
1 | 115 | StimB | 1.0 | 10.0 |
2 | 185 | StimB | 1.0 | 10.0 |
betas_S1D1 | |||
---|---|---|---|
regressor | channel | chromo | |
HRF StimA | S1D1 | HbO | 0.711062 |
HbR | -0.175408 | ||
S1D2 | HbO | -0.008824 | |
HbR | -0.004429 | ||
HRF StimB | S1D1 | HbO | -0.019069 |
HbR | 0.006574 | ||
S1D2 | HbO | 0.510643 | |
HbR | -0.125420 | ||
Drift 0 | S1D1 | HbO | 0.510514 |
HbR | 0.497951 | ||
S1D2 | HbO | 0.257398 | |
HbR | 0.249386 |
[12]:
pred = glm.predict(ts, betas, dm_other, channel_wise_regressors_other)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim_other)
pred = glm.predict(
ts,
betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
dm_other,
channel_wise_regressors_other,
)
plot_data_to_fit_comparison(ts, pred, stim_other)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB 0.5105 0.498 0.2574 0.2494 0.5105 0.498 ... 0.2494 0.5105 0.498 0.2574 0.2494 Coordinates: * time (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9 * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) <U4 32B 'S1D1' 'S1D2'


Fitting with multiple gaussian kernels
[13]:
dm_other, channel_wise_regressors_other = glm.make_design_matrix(
ts,
None,
stim_other,
None,
basis_function=glm.GaussianKernels(
t_pre=5 * units.s, t_post=30 * units.s, t_delta=3 * units.s, t_std=2 * units.s
),
drift_order=0,
short_channel_method=None,
)
betas = glm.fit(ts, dm_other, channel_wise_regressors_other, noise_model="ols")
f,ax = p.subplots(1,1, figsize=(12,4))
for reg in dm_other.regressor.values:
p.plot(dm_other.time, dm_other.sel(chromo="HbO", regressor=reg), label=reg)
plots.plot_stim_markers(ax, stim_other, y=1.)
p.legend(ncol=3, loc="center right")
p.xlim(0,90)
[13]:
(0.0, 90.0)

[14]:
# translate the xr.DataArray into a pd.DataFrame which are displayed as tables
display(betas.rename("betas_S1D1").to_dataframe())
betas_S1D1 | |||
---|---|---|---|
regressor | channel | chromo | |
HRF StimA 00 | S1D1 | HbO | -0.019340 |
HbR | 0.001062 | ||
S1D2 | HbO | -0.016493 | |
HbR | -0.009073 | ||
HRF StimA 01 | S1D1 | HbO | 0.009910 |
... | ... | ... | ... |
HRF StimB 10 | S1D2 | HbR | 0.015613 |
Drift 0 | S1D1 | HbO | 0.499179 |
HbR | 0.500876 | ||
S1D2 | HbO | 0.244598 | |
HbR | 0.250009 |
92 rows × 1 columns
[15]:
pred = glm.predict(ts, betas, dm_other, channel_wise_regressors_other)
display(pred)
plot_data_to_fit_comparison(ts, pred, stim_other)
pred = glm.predict(
ts,
betas.sel(regressor=betas.regressor.str.startswith("HRF Stim")),
dm_other,
channel_wise_regressors_other,
)
plot_data_to_fit_comparison(ts, pred, stim_other)
<xarray.DataArray (time: 2400, channel: 2, chromo: 2)> Size: 77kB 0.4992 0.5009 0.2446 0.25 0.4992 0.5009 ... 0.25 0.4992 0.5009 0.2446 0.25 Coordinates: * time (time) float64 19kB 0.0 0.1 0.2 0.3 0.4 ... 239.6 239.7 239.8 239.9 * chromo (chromo) <U3 24B 'HbO' 'HbR' * channel (channel) <U4 32B 'S1D1' 'S1D2'

