cedalion.sim.datasets package

Submodules

cedalion.sim.datasets.synthetic_fnirs_eeg module

Simulate bimodal data as a toy model for concurrent EEG–fNIRS data.

Follows an approach inspired by Dähne et al., 2014.

Reference: https://doi.org/10.1016/j.neuroimage.2014.01.014

class cedalion.sim.datasets.synthetic_fnirs_eeg.BimodalToyDataSimulation(config, seed=None, mixing_type='structured')[source]

Bases: object

Simulate bimodal data as a toy model for concurrent EEG-fNIRS data.

This class creates coupled (bimodal) synthetic signals representing an “X” modality (e.g., EEG) and a “Y” modality (e.g., fNIRS). It provides utilities for simulating sources, generating spatial mixing patterns, projecting to channels via a simple forward model, computing power, and basic visualization.

Toy data description:

EEG (x) and fNIRS (y) recordings are generated from a pseudo-random linear mixing forward model. Sources s_x and s_y are divided into background sources (independent between modalities) and target sources (co-modulated across modalities). Each EEG background source is constructed from a random oscillatory signal in a chosen frequency band multiplied by a slow-varying random amplitude-modulation function. This amplitude modulation acts as the envelope of s_x and provides an estimate of its bandpower timecourse. fNIRS background sources are generated directly from slow-varying amplitude-modulation functions in the same way.

Target sources are built similarly, except that the same envelope modulating s_x is also used for the corresponding fNIRS source s_y. This coupling may be delayed by a time-lag parameter dT to simulate realistic physiological delays between modalities. fNIRS sources and recordings are then downsampled to epoch intervals of length T_epoch.

Background mixing matrices A_x and A_y are drawn from normal distributions, while target mixing matrices use Gaussian radial basis functions (RBFs) centered on shared positions, plus white noise, to create spatial patterns with local structure.

Signal-to-noise ratio (SNR) is controlled by the parameter gamma, which weights target versus background contributions in channel space. The relationship with decibel units is given by SNR [dB] = 20 * log10(gamma).

EEG channel recordings and their bandpower timecourses are available via the attributes x (high-rate channels) and x_power (epoch downsampled). The downsampled x_power is aligned with fNIRS recordings y. Target sources are accessible as sx_t and sy_t; thei powerband timecourse of the former is available as sx_power.

Parameters:
  • config (str | dict) – Path to a YAML config file or a dictionary containing simulation parameters. See generate_args() for derived fields.

  • seed (int | None) – Random seed for reproducibility. If None, uses the current UNIX timestamp.

  • mixing_type (str) – Type of mixing to generate for target sources. 'structured' assigns localized RBF-like patterns; any other value leaves patterns purely random.

args[source]

Simulation parameters.

Type:

argparse.Namespace

seed[source]

Random seed used for reproducibility.

Type:

int

sx_montage[source]

Montage for X channels.

Type:

xr.DataArray

sy_montage[source]

Montage for Y channels.

Type:

xr.DataArray

s_labels[source]

Labels for target sources.

Type:

list[str]

s_target_positions[source]

2D positions of target sources.

Type:

xr.DataArray

sx_t[source]

Target sources for X modality, dims ('source', 'time').

Type:

xr.DataArray

sy_t[source]

Target sources for Y modality, dims ('source', 'time'), sampled over epochs.

Type:

xr.DataArray

sx_ba[source]

Background sources for X modality, dims ('source', 'time').

Type:

xr.DataArray

sy_ba[source]

Background sources for Y modality, dims ('source', 'time'), sampled over epochs.

Type:

xr.DataArray

sx_power[source]

Power of target sources for X modality, dims ('source', 'time'), sampled over epochs.

Type:

xr.DataArray

Ax[source]

Mixing matrix for X modality, dims ('channel', 'source').

Type:

xr.DataArray

ax_t[source]

Mixing patterns for target sources in X, dims ('channel', 'source').

Type:

xr.DataArray

ax_ba[source]

Mixing patterns for background sources in X, dims ('channel', 'source').

Type:

xr.DataArray

Ay[source]

Mixing matrix for Y modality, dims ('channel', 'source').

Type:

xr.DataArray

ay_t[source]

Mixing patterns for target sources in Y, dims ('channel', 'source').

Type:

xr.DataArray

ay_ba[source]

Mixing patterns for background sources in Y, dims ('channel', 'source').

Type:

xr.DataArray

x[source]

Observed channels for X modality, dims ('channel', 'time').

Type:

xr.DataArray

x_power[source]

Power of observed channels for X modality, dims ('channel', 'time'), sampled over epochs.

Type:

xr.DataArray

y[source]

Observed channels for Y modality, dims ('channel', 'time'), sampled over epochs.

Type:

xr.DataArray

preprocess_data(train_test_split=0.8)[source]

Normalize and split the simulated data into train/test subsets.

Parameters:

train_test_split (float) – Fraction of the time axis to use for the training split (0 < train_test_split < 1). The remainder is used for testing.

Returns:

A dictionary with keys:
  • 'x_train' (xr.DataArray): X channels (train).

  • 'x_test' (xr.DataArray): X channels (test).

  • 'x_power_train' (xr.DataArray): X power (train).

  • 'x_power_test' (xr.DataArray): X power (test).

  • 'y_train' (xr.DataArray): Y channels (train).

  • 'y_test' (xr.DataArray): Y channels (test).

  • 'sx' (xr.DataArray): Target X sources, test portion only.

  • 'sx_power' (xr.DataArray): Target X source power, test only.

  • 'sy' (xr.DataArray): Target Y sources, test portion only.

Return type:

dict

static generate_montage(Nc, channel_label)[source]

Create a symmetric 2D montage of channels within [0, 1] x [0, 1].

The function builds the smallest grid with at least Nc points, ranks points by distance to the center, and selects the closest Nc points for an aesthetically symmetric montage.

Parameters:
  • Nc (int) – Number of channels to place.

  • channel_label (str) – Prefix used to name channels (e.g., 'X' or 'Y'). Channels are labeled as '<label><idx>' starting at 1.

Returns:

Array of shape (channel, dim) with coordinates channel=[<label>1, ..., <label>Nc] and dim=['x','y'] containing the 2D positions in the unit square.

Return type:

xr.DataArray

simulate_sources()[source]

Simulate target and background sources for both modalities.

Target sources are constructed from band-limited oscillations with amplitude modulation and a modality-dependent temporal shift. Background sources are independent amplitude-modulated oscillations.

Parameters:

None

Returns:

(sx_t, sy_t, sx_ba, sy_ba) where - sx_t: target sources for X, dims ('source','time') of

length (Ns_target, Nt).

  • sy_t: target sources for Y, dims ('source','time') of length (Ns_target, Ne) (epoch-averaged amplitude).

  • sx_ba: background sources for X, dims ('source','time') of length (Ns_ba, Nt).

  • sy_ba: background sources for Y, dims ('source','time') of length (Ns_ba, Ne).

Return type:

tuple[xr.DataArray, xr.DataArray, xr.DataArray, xr.DataArray]

simulate_random_source(T=None, Nt=None)[source]

Simulate a single random oscillatory source within (f_min, f_max).

The source is synthesized in the frequency domain with unit amplitude and random phases in the desired band, then transformed back to time via an inverse FFT and envelope-normalized to unity.

Parameters:
  • T (float | None) – Total duration in seconds. Defaults to self.args.T when None.

  • Nt (int | None) – Number of time samples. Defaults to self.args.Nt when None.

Returns:

Time-domain signal of shape (Nt,) with unit envelope (via analytic signal magnitude normalization).

Return type:

np.ndarray

simulate_amplitude(Nt=None)[source]

Simulate an amplitude modulation signal from low-pass noise.

Parameters:

Nt (int | None) – Number of samples to generate. Defaults to self.args.Nt when None.

Returns:

Positive amplitude modulation of shape (Nt,) scaled to [0, 1].

Return type:

np.ndarray

generate_patterns(mixing_type='structured')[source]

Generate mixing matrices and split them into target/background.

Parameters:

mixing_type (str) – If 'structured', assign localized RBF-like patterns to target sources based on distances from true source positions to channel locations; otherwise leave as random.

Returns:

(Ax, ax_t, ax_ba, Ay, ay_t, ay_ba) where each is an

xr.DataArray with dims ('channel','source'). *_t contains only the first Ns_target sources and *_ba the background sources.

Return type:

tuple

forward_model(a_t, a_ba, s_t, s_ba)[source]

Project sources to channels and add background/noise.

Parameters:
  • a_t (xr.DataArray) – Mixing patterns for target sources with dims ('channel','source').

  • a_ba (xr.DataArray) – Mixing patterns for background sources with dims ('channel','source').

  • s_t (xr.DataArray) – Target source time series with dims ('source','time').

  • s_ba (xr.DataArray) – Background source time series with dims ('source','time').

Returns:

(x_t, x_noise) where

x_t is the projected target-only signal and x_noise is the combination of background and Gaussian noise (both Frobenius normalized), each with dims ('channel','time').

Return type:

tuple[xr.DataArray, xr.DataArray]

get_channels(x_t, x_noise, gamma, calculate_power=False)[source]

Combine target and noise components to form observed channels.

Parameters:
  • x_t (xr.DataArray) – Target-only channels with dims ('channel','time').

  • x_noise (xr.DataArray) – Background+noise channels with dims ('channel','time').

  • gamma (float) – Mixture weight controlling SNR; higher values weight the target more heavily.

  • calculate_power (bool) – If True, also compute envelope-based power per-epoch for each channel.

Returns:

If calculate_power is False, returns x (channels). If True, returns (x, x_power) where x_power has dims ('channel','time') over epochs.

Return type:

xr.DataArray | tuple[xr.DataArray, xr.DataArray]

plot_targets(xlim=None, ylim=None)[source]

Plot target sources and their envelopes/power with optional limits.

Parameters:
  • xlim (tuple[float, float] | None) – (xmin, xmax) for the x-axis.

  • ylim (tuple[float, float] | None) – (ymin, ymax) for the y-axis.

Returns:

None

plot_channels(N=2, xlim=None, ylim=None)[source]

Plot N pairs of channels for X and Y modalities.

Parameters:
  • N (int) – Number of top-index channels to plot.

  • xlim (tuple[float, float] | None) – (xmin, xmax) for the x-axis.

  • ylim (tuple[float, float] | None) – (ymin, ymax) for the y-axis.

Returns:

None

plot_mixing_patterns(
Ax=None,
Ay=None,
cmap='viridis',
activity_size=200,
title=None,
)[source]

Plot mixing patterns for target sources across both modalities.

This method generates a scatter plot of mixing patterns Ax and Ay for sources, in a 2D grid following the X and Y montages, respectively. The scatter points represent the activity level at each channel. Each source is represented in a separate row, with Ax on the left and Ay on the right.

Parameters:
  • Ax (np.ndarray | xr.DataArray | None) – Mixing matrix for X modality with dims/shape (n_channels, n_sources). If None, uses self.ax_t.

  • Ay (np.ndarray | xr.DataArray | None) – Mixing matrix for Y modality with dims/shape (n_channels, n_sources). If None, uses self.ay_t.

  • cmap (str) – Matplotlib colormap name used to render activity.

  • activity_size (float) – Marker size for scatter points.

  • title (str | None) – Optional figure title.

Returns:

None

cedalion.sim.datasets.synthetic_fnirs_eeg.generate_args(config)[source]

Read arguments from config and extend with derived parameters.

Parameters:

config (str | dict) – Path to the configuration YAML file or a dict containing simulation parameters. Expected base keys include T, rate, T_epoch, dT, f_min, f_max, Ns_all, Ns_target, Nx, Ny, ellx, elly, sigma_noise, gamma, and gamma_e.

Returns:

Namespace with original and derived fields (e.g., Nt, NdT, e_len, Ne, Nde, and Ns_ba).

Return type:

argparse.Namespace

cedalion.sim.datasets.synthetic_fnirs_eeg.set_seed(seed)[source]

Set the global random seeds for reproducibility.

Parameters:

seed (int) – Seed value to set for numpy and Python’s random.

Returns:

None

cedalion.sim.datasets.synthetic_fnirs_eeg.butter_lowpass(cut, fs, order)[source]

Construct a digital Butterworth low-pass filter.

Parameters:
  • cut (float) – Cutoff frequency in Hz.

  • fs (float) – Sampling rate in Hz.

  • order (int) – Filter order.

Returns:

Numerator b and denominator a filter coefficients suitable for scipy.signal.lfilter().

Return type:

tuple[np.ndarray, np.ndarray]

cedalion.sim.datasets.synthetic_fnirs_eeg.butter_lowpass_filter(data, cut, fs, order=5)[source]

Apply a Butterworth low-pass filter to a 1D signal.

Parameters:
  • data (np.ndarray) – Input time series of shape (N,).

  • cut (float) – Cutoff frequency in Hz.

  • fs (float) – Sampling rate in Hz.

  • order (int) – Filter order.

Returns:

Filtered signal of the same shape as data.

Return type:

np.ndarray

cedalion.sim.datasets.synthetic_fnirs_eeg.f_normalize(x)[source]

Frobenius-normalize an array.

The array is divided by the square root of the sum of squares of all entries. Useful to normalize channel matrices or multichannel signals.

Parameters:

x (np.ndarray | xr.DataArray) – Input array.

Returns:

Normalized array with the same shape and type as x.

Return type:

np.ndarray | xr.DataArray

cedalion.sim.datasets.synthetic_fnirs_eeg.standardize(x, dim='time')[source]

Z-score standardize along a dimension (for xarray) or globally (numpy).

Parameters:
  • x (xr.DataArray | np.ndarray) – Input array to standardize.

  • dim (str) – Dimension name along which to standardize when x is an xr.DataArray. Ignored for np.ndarray.

Returns:

Standardized array with mean 0 and std 1.

Return type:

xr.DataArray | np.ndarray

cedalion.sim.datasets.synthetic_fnirs_eeg.split_epochs(x, e_len)[source]

Split a 1D signal into non-overlapping epochs of a given length.

Parameters:
  • x (np.ndarray) – 1D input signal of length N.

  • e_len (int) – Epoch length in samples.

Returns:

Array of shape (Ne, e_len) where Ne = N // e_len.

Return type:

np.ndarray

Module contents

Synthetic toy datasets.