cedalion.mlutils package

Submodules

cedalion.mlutils.cv module

cedalion.mlutils.cv.create_cv_splits(
df_stim: DataFrame,
n_splits: int,
) Generator[tuple[DataFrame, DataFrame], None, None][source]

Split stimulus events into train and test sets for cross-validation.

Parameters:
  • df_stim – Stimulus events, sorted by onset times with ordered index.

  • n_splits – number of folds

Yields:

For each fold, the stimulis data frame split into train and test set.

The test trials are consecutive and not randomized.

cedalion.mlutils.cv.mask_design_matrix(
dms: ~cedalion.models.glm.design_matrix.DesignMatrix,
df_stim_test: ~pandas.core.frame.DataFrame,
before: ~pint.Annotated[~pint.Quantity,
'[time]'] = <Quantity(5,
'second')>,
after: ~pint.Annotated[~pint.Quantity,
'[time]'] = <Quantity(20,
'second')>,
) DesignMatrix[source]

Mask a segment of the design matrix by setting it to zero.

When using GLM parameters as features, the fit must not have access to the test trials. This function zeros out a contiguous segment of the design matrix, ensuring that the model cannot explain the time course in the masked segment for any choice of parameters. The segment extends from the earliest to the latest trial in df_stim_test, padded by additional time specified by the before and after parameters. Because the masked segment is continuous, the train-test split must be chosen such that the test trials are consecutive.

Parameters:
  • dms – The design matrix to mask

  • df_stim_test – test set of stimulus events.

  • before – time to pad before the earlist test trial

  • after – time to pad after the latest test trial

Returns:

A copy of the design matrix with the masked segment set to zero.

cedalion.mlutils.features module

Feature extraction from epoched fNIRS data for use with scikit-learn pipelines.

cedalion.mlutils.features.epoch_features(
epochs: DataArray,
feature_types: list[Literal['slope', 'mean', 'max', 'min', 'auc']],
reltime_slices: dict[Literal['slope', 'mean', 'max', 'min', 'auc'], slice] | None = None,
)[source]

Extract scalar features from epoched data for use in ML classifiers.

For each requested feature type, a scalar value is computed over the "reltime" axis (optionally restricted to a sub-window). All non-epoch dimensions (channel, chromo, …) are then stacked into a flat "feature" dimension so the result is suitable as a 2-D feature matrix for scikit-learn estimators (rows = epochs, columns = features).

Parameters:
  • epochs – DataArray with at least an "epoch" dimension and a "reltime" dimension.

  • feature_types – One or more of "slope", "mean", "max", "min", "auc". A string is also accepted as a shorthand for a single-element list.

  • reltime_slices – Optional mapping from feature type to a slice of relative-time values used to restrict the window for that feature. Unspecified feature types use the full reltime range.

Returns:

xr.DataArray with dimensions (epoch, feature) where feature is a multi-index stacking all non-epoch, non-reltime dimensions and the feature_type label.

Raises:

ValueError – If an unrecognised feature type is requested.

Module contents