cedalion.sigdecomp.multimodal package
Submodules
cedalion.sigdecomp.multimodal.cca_models module
Module for CCA-like models
- class cedalion.sigdecomp.multimodal.cca_models.MultimodalSourceDecomposition(
- N_components: int = None,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
Bases:
object
Class for decomposing multimodal data, X and Y, into latent sources using linear filters.
This main class is inherited by other source decomposition methods, such as ElasticNetCCA, ssCCA, and PLS. It implements methods to validate input dimensions, apply normalization, and transform data from two modalities using filters learned during training.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
- validate_inputs_fit(
- X: DataArray,
- Y: DataArray,
Validates input data of fit function and returns it with the correct dimensions and labels.
This method ensures that the input data to the fit function, X and Y, i.e. those used for training, have the expected dimension labels and sizes and returns them with the dimensions ordered as (sample_name, feature_name). It also initializes the number of samples, and features.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
- A tuple (X, Y) where:
X (DataArray): Input data for modality X ordered as (sample_name, featureX_name). Y (DataArray): Input data for modality Y ordered as (sample_name, featureY_name).
- Return type:
tuple
- validate_inputs_transform(
- X: DataArray,
- Y: DataArray,
Validates that the to-be-transformed data have the expected dimension labels and sizes.
This method ensures that X and Y have the same dimension labels and number of features than the ones used during training. The number of time points, however, can be different.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
- A tuple (X, Y) where:
X (DataArray): Input data for modality X ordered as (sample_name, featureX_name). Y (DataArray): Input data for modality Y ordered as (sample_name, featureY_name).
- Return type:
tuple
- normalization_fit(
- X: DataArray,
- Y: DataArray,
Normalize input data and save normalization parameters (mean and std) for later.
This method centers and scales the data for both modalities along the sample dimension. It computes the mean and standard deviation for X and Y using the provided standardization function, updating the corresponding class attributes.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
A tuple (X, Y) of standardized data arrays for modalities X and Y.
- Return type:
tuple
- normalization_transform(
- X: DataArray,
- Y: DataArray,
Applies normalization input data using trained parameters.
This method standardizes the input data arrays X and Y using the normalization parameters (mean and standard deviation) computed during the fitting process.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
A tuple (X, Y) of normalized data arrays.
- Return type:
tuple
- convert_filters_to_DataArray(
- Wx: ndarray,
- Wy: ndarray,
- X: DataArray,
- Y: DataArray,
Convert filters Wx and Wy in numpy array format to DataArray with right dimensions and coordinates.
- Parameters:
Wx (ndarray) – Filter matrix for modality X with shape (Nx, N_components).
Wy (ndarray) – Filter matrix for modality Y with shape (Ny, N_components).
X_features (DataArray) – DataArray containing the features of modality X.
Y_features (DataArray) – DataArray containing the features of modality Y.
- Returns:
A tuple (Wx_xr, Wy_xr) containing the DataArray versions of Wx and Wy respectively.
- Return type:
tuple[DataArray, DataArray]
- transform(
- X: DataArray,
- Y: DataArray,
Apply the linear transformation on the input data using learnt filters.
This method validates the dimension labels and sizes of the input data to ensure consistency with the training data, applies normalization using the stored parameters, and then projects the normalized data onto a lower-dimensional space using the learned filters Wx and Wy. It retrieves the transformed arrays, a.k.a reconstructed sources.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
- Returns:
- A tuple (X_new, Y_new) where:
X_new (DataArray): Transformed data for modality X. Y_new (DataArray): Transformed data for modality Y.
- Return type:
tuple
- class cedalion.sigdecomp.multimodal.cca_models.ElasticNetCCA(
- N_components: int = None,
- l1_reg: float | list[float, float] = 0,
- l2_reg: float | list[float, float] = 0,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
- pls: bool = False,
Bases:
MultimodalSourceDecomposition
Perform Elastic Net Canonical Correlation Analysis (CCA) between two datasets X and Y.
Apply CCA with L1 + L2 regularization, a.k.a elastic net. The algorithm finds sparse (L1) and normalized (L2) vectors Wx, and Wy as the solution to the following constrained optimization problem:
maximize Wx^T Cxy Wy subject to Wx^T Cx Wx = 1, Wy^T Cy Wy = 1,
||Wx||_1 <= c1x, ||Wy||_1 <= c1y, ||Wx||^2_2 <= c2x, ||Wy||^2_2 <= c2y
where Cx, Cy, and Cxy are the individual and cross-covariance matrices between X and Y datasets, and the last four constraints correspond to the standard L1-norm and L2-norm penalization terms. c1x and c1y controls sparsity while c2x and c2y controls the magnitude of the vectors. PLS algorithms are also captured by this algorithm by sending Cx and Cy to the identity matrices.
For the one-unit algorithm (sparse) SVD decomposition is performed on the whitened cross-covariance matrix K = Cx^(-1/2) Cxy Cy^(-1/2) (reduced to K = Cxy for PLS), using the following standard alternating power method (based on Parkhomenko et al. [PTB09]):
- Update u:
u <- K * v
u <- u / ||u||
- If L1:
u <- SoftThresholding(u, lambda_u/2) u <- u / ||u||
- Update v:
v <- K^T * u
v <- v / ||v||
- If L1:
v <- SoftThresholding(v, lambda_v/2) v <- v / ||v||
The resulting u and v are the leading left and right singular vectors of K which are nothing but individual components of the filters Wx and Wy. The softthresholding function bring some components to zero. If L2 regularization is used, prior to computing K, Cx and Cy are shifted by Cx <- Cx + alpha_x I and Cy <- Cy + alpha_y I.
Multiple components are obtained via a deflation method, subtracting from K its 1-rank approximation on each iteration. The returned vectors Wx and Wy are ordered in desceding order w.r.t. the singular values, which coincide with the canonical correlations.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
l1_reg (float or list of floats) – list containing lambda_u and lambda_v (see above). If a single float is provided,
lambda_v. (then lambda_u =)
l2_reg (float or list of floats) – list containing alpha_x and alpha_y (see above). If a single float is provided,
alpha_y. (then alpha_x =)
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
pls (bool) – Whether to perform PLS regression. Defaults to False.
- Wx[source]
Linear filters for dataset X with dimensions (featureX_name, latent_featureX_name)
- Type:
DataArray
- Wy[source]
Linear filters for dataset Y with dimensions (featureY_name, latent_featureY_name).
- Type:
DataArray
- fit(
- X: DataArray,
- Y: DataArray,
- sample_name: str = 'time',
- featureX_name: str = 'channel',
- featureY_name: str = 'channel',
Find the canonical vectors Wx, and Wy for the datasets X and Y.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
sample_name (str, optional) – Label for sample dimension, set to ‘time’ by default.
featureX_name (str, optional) – Label for X-feature dimension, set to ‘channel’ by default.
featureY_name (str, optional) – Label for Y-feature dimension, set to ‘channel’ by default.
- class cedalion.sigdecomp.multimodal.cca_models.StructuredSparseCCA(
- N_components: int = None,
- Lx: ndarray = None,
- Ly: ndarray = None,
- l1_reg: float | list[float, float] = 0,
- l2_reg: float | list[float, float] = 0,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
- pls: bool = False,
Bases:
MultimodalSourceDecomposition
Perform structured sparse Canonical Correlation Analysis (ssCCA) between two datasets X and Y.
The ssCCA algorithm is based on Chen et al. [CBL+13] and it assumes the underlying X and Y features are linked through a graph structure. It finds sparse (L1) vectors Wx, and Wy as the solution to the following constrained optimization problem:
maximize Wx^T Cxy Wy subject to Wx^T Cx Wx = 1, Wy^T Cy Wy = 1,
||Wx||_1 <= c1x, ||Wy||_1 <= c1y, Wx^T Lx Wx <= c2x, Wy^T Ly Wy <= c2y
where Cx, Cy, and Cxy are the individual and cross-covariance matrices between X and Y datasets. The second constraint is the standard L1-norm penalization term, while the last constraint incorporates local information of the spatial distribution of the features trough the Laplacian matrices Lx and Ly. These terms encaurage filter components that are linked on the graphical structure to have similar values, making them to vary smoothly across the graph. The c1x and c1y controls sparsity while c2x and c2y controls the relative importante of the graph structure.
For the one-unit algorithm, first Cx and Cy are shifted by Cx <- Cx + alpha_x Lx and Cy <- Cy + alpha_y Ly, and then SVD decomposition is performed on the whitened cross-covariance matrix K = Cx^(-1/2) Cxy Cy^(-1/2), using the following standard alternating power method (based on Parkhomenko et al. [PTB09]):
- Update u:
u <- K * v
u <- u / ||u||
- If L1:
u <- SoftThresholding(u, lambda_u/2) u <- u / ||u||
- Update v:
v <- K^T * u
v <- v / ||v||
- If L1:
v <- SoftThresholding(v, lambda_v/2) v <- v / ||v||
The resulting u and v are the leading left and right singular vectors of K which are nothing but individual components of the filters Wx and Wy. The softthresholding function bring some components to zero.
Multiple components are obtained via a deflation method, subtracting from K its 1-rank approximation on each iteration. The returned vectors Wx and Wy are ordered in desceding order w.r.t. the singular values, which coincide with the canonical correlations.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
Lx (ndarray) – Laplacian matrix for modality X.
Ly (ndarray) – Laplacian matrix for modality Y.
l1_reg (float or list of floats) – list containing lambda_u and lambda_v (see above). If a single float is provided,
lambda_v. (then lambda_u =)
l2_reg (float or list of floats) – list containing alpha_x and alpha_y (see above). If a single float is provided,
alpha_y. (then alpha_x =)
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
pls (bool) – Whether to perform PLS regression. Defaults to False.
- Wx[source]
Linear filters for dataset X with dimensions (featureX_name, latent_featureX_name)
- Type:
DataArray
- Wy[source]
Linear filters for dataset Y with dimensions (featureY_name, latent_featureY_name).
- Type:
DataArray
- fit(
- X: DataArray,
- Y: DataArray,
- sample_name: str = 'time',
- featureX_name: str = 'channel',
- featureY_name: str = 'channel',
Find the canonical vectors Wx, and Wy for the datasets X and Y.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
sample_name (str, optional) – Label for sample dimension, set to ‘time’ by default.
featureX_name (str, optional) – Label for X-feature dimension, set to ‘channel’ by default.
featureY_name (str, optional) – Label for Y-feature dimension, set to ‘channel’ by default.
- class cedalion.sigdecomp.multimodal.cca_models.RidgeCCA(
- N_components: int = None,
- l2_reg: float | list[float, float] = 0,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
Bases:
ElasticNetCCA
Perform CCA between two datasets X and Y with L2 regularization, a.k.a ridge CCA.
This algorithm is a particular case of the one implemented in the ElasticNetCCA class. See there for a detailed explanation of the algorithm.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
l2_reg (float or list of floats) – list containing alpha_x and alpha_y (see above). If a single float is provided,
alpha_y. (then alpha_x =)
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
- class cedalion.sigdecomp.multimodal.cca_models.SparseCCA(
- N_components: int = None,
- l1_reg: float | list[float, float] = 0,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
Bases:
ElasticNetCCA
Perform Sparse CCA between two datasets X and Y with L1 regularization, a.k.a sparse CCA, based on Parkhomenko et al. [PTB09].
This algorithm is a particular case of the one implemented in the ElasticNetCCA class. See there for a detailed explanation of the algorithm.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
l1_reg (float or list of floats) – list containing lambda_u and lambda_v (see above). If a single float is provided,
lambda_v. (then lambda_u =)
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
- class cedalion.sigdecomp.multimodal.cca_models.CCA(
- N_components: int = None,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
Bases:
ElasticNetCCA
Perform CCA between two datasets X and Y.
This algorithm is a particular case of the one implemented in the ElasticNetCCA class. See there for a detailed explanation of the algorithm.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
- class cedalion.sigdecomp.multimodal.cca_models.SparsePLS(
- N_components: int = None,
- l1_reg: float | list[float, float] = 0,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
Bases:
ElasticNetCCA
Perform Partial Least Squares (PLS) between two datasets X and Y with L1 regularization, a.k.a sparse PLS, based on a combination from Parkhomenko et al. [PTB09] and Witten et al. [WTH09].
In Witten’s paper, the algorithm is presented as a particular case of their Penalized Matrix Decomposition (PMD) method, called PMD(L1, L1) or Sparse CCA. However, the latter name is misleading since in this problem we use identity matrices for the L2-norm constraints, rather than correlation matrices. That difference makes the method truly a SparsePLS one. Here, Witten’s method is modified by adding normalization on each iteration and dividing L1 parameters by 2.
This algorithm is a particular case of the one implemented in the ElasticNetCCA class. See there for a detailed explanation of the algorithm.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
l1_reg (float or list of floats) – list containing lambda_u and lambda_v (see above).
provided (If a single float is)
lambda_v. (then lambda_u =)
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
- class cedalion.sigdecomp.multimodal.cca_models.PLS(
- N_components: int = None,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
Bases:
SparsePLS
Perform PLS between two datasets X and Y. This algorithm is a particular case of the one implemented in the SparsePLS class when no penalty is imposed. See there for a detailed explanation of the algorithm.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
max_iter (int) – Maximum number of iterations for the algorithm. Defaults to 100.
tol (float) – Tolerance for convergence. Defaults to 1e-6.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
- cedalion.sigdecomp.multimodal.cca_models.estimate_filters(
- X: ndarray,
- Y: ndarray,
- N_components: int,
- l1_reg: list[float, float] = 0,
- l2_reg: list[float, float] = 0,
- Lx: ndarray = None,
- Ly: ndarray = None,
- pls: bool = False,
“Estimate the canonical vectors Wx, and Wy for the datasets X and Y.
Main function that estimates the canonical vectors Wx, and Wy for the datasets X and Y using an Elastic Net CCA algorithm with the option of incorporating Laplace matrices, Lx and Ly that transforms the algorithm into a structured sparse CCA. It assumes X and Y have shapes (samples, features).
- Parameters:
X (ndarray) – Input data for modality X with shape (Nt, Nx).
Y (ndarray) – Input data for modality Y with shape (Nt, Ny).
N_components (int) – Number of components to extract.
l1_reg (list of floats) – list containing lambda_u and lambda_v.
l2_reg (list of floats) – list containing alpha_x and alpha_y.
Lx (ndarray, optional) – Laplacian matrix for modality X. Defaults to None.
Ly (ndarray, optional) – Laplacian matrix for modality Y. Defaults to None.
pls (bool, optional) – Whether to perform PLS regression. Defaults to False.
- Returns:
A tuple (Wx, Wy) with the canonical vectors for X and Y, ordered by descending singular values.
- Return type:
tuple
- cedalion.sigdecomp.multimodal.cca_models.inv_sqrt_cov(C: ndarray, eps: float = 1e-10) ndarray [source]
Compute the inverse square root of a covariance matrix C.
Given a (symmetric) covariance matrix C, it computes C^{-1/2} = U Lambda^{-1/2} U^T using eigen-decomposition and clipping the inverse diagonal entries to eps to avoid division by zero and instabilities.
- Parameters:
C (ndarray) – Convariance matrix. Expected to be square and symmetric.
eps (float, optional) – Small value to avoid division by zero during inversion.
- Returns:
Inverse square root of C of the same size as input matrix.
- cedalion.sigdecomp.multimodal.cca_models.get_singular_vectors(
- X: ndarray,
- N_components: int,
- l1_reg: float | list[float, float] = 0,
Extracts the top singular vectors from X using an iterative power method.
The function iteratively extracts one sparse singular component at a time. On each iteration, it computes the leading singular pair using an alternating power method, subtracts the rank-1 approximation from X, and stores the component. Sparsity is enforced via L1 regularization when l1_reg > 0.
- Parameters:
X (ndarray) – Input matrix of shape (M, N) from which singular vectors are extracted.
N_components (int) – Number of singular components to extract.
l1_reg (float or list of floats, optional) – Regularization parameter for L1 sparsity. If scalar, the same value is applied for both u and v. Defaults to 0 (no sparsity).
- Returns:
U (ndarray): Matrix of left singular vectors with shape (m, N_components). S (ndarray): Array of singular values in the diagonal with length N_components. V (ndarray): Matrix of right singular vectors with shape (n, N_components).
- Return type:
tuple
- cedalion.sigdecomp.multimodal.cca_models.leading_singular_pair_power_method(
- X: ndarray,
- l1_reg: float | list[float, float] = 0,
- max_iter: int = 1000,
- tol: float = 1e-06,
Compute the leading (sparse) singular vector pair and value (u, sigma, v) of a matrix X using an alternating power method.
The method alternates between updating the left singular vector (u) and the right singular vector (v) until convergence following the rules:
- Update u:
u <- K * v
u <- u / ||u||
- If L1:
u <- SoftThresholding(u, lambda_u/2) u <- u / ||u||
- Update v:
v <- K^T * u
v <- v / ||v||
- If L1:
v <- SoftThresholding(v, lambda_v/2) v <- v / ||v||
Sparsity is enforced via soft-thresholding if the corresponding regularization parameters (lambda_u and lambda_v) encoded in l1_reg are set to a nonzero value.
- Parameters:
X (ndarray) – Input matrix of shape (m, n).
l1_reg (int, float, or list, optional) – L1 regularization parameter(s) for sparsity. If a scalar, the same value is applied to both u and v; if a list of two values, the first is used for u and the second for v. Defaults to 0 (no sparsity).
max_iter (int, optional) – Maximum number of iterations. Defaults to 1000.
tol (float, optional) – Convergence tolerance. Defaults to 1e-6.
- Returns:
u (np.ndarray): Leading left singular vector of shape (m, 1). sigma (float): Leading singular value. v (np.ndarray): Leading right singular vector of shape (n, 1).
- Return type:
tuple
cedalion.sigdecomp.multimodal.mspoc module
Implements the mSPoC algorithm for multimodal data.
- class cedalion.sigdecomp.multimodal.mspoc.mSPoC(
- N_components: int = None,
- time_shifts=None,
- N_restarts: int = 2,
- max_iter: int = 200,
- tol: float = 1e-05,
- scale: bool = True,
- shift_source: bool = True,
Bases:
object
Implements the multimodal Source Power Co-modulation (mSPoC) algorithm based on Dähne et al. [DBM+13].
Given two vector-valued time series X(t), and Y(t), mSPoC finds component pairs Sx = Wx.T @ X, and Sy = Wy.T @ Y, such that the covariance between the temporally-embedded bandpower of Sx and the time course of Sy is maximized. The solution to that optimization problem is captured by the spatial (Wx, Wy), and temporal (Wt) filters.
X(t) must be of shape Ntx x Nx, where Nx is the number of channels and Nt the number of time points, and it is band-pass filtered in the frequency band of interest. Bandpower of Sx is then approximated by its variance within epochs. The latter are defined by the time points of Y(t), which must have a greater sampling rate. Y(t) is of shape Nty x Ny, and Nty < Ntx. Both signals are mean-free and temporally aligned.
- Parameters:
N_components (int) – Number of component pairs the algorithm will find.
time_shifts (list) – List of time shifts to consider in the temporal embedding.
N_restarts (int) – Number of times the algorithm is repeated.
max_iter (int) – Maximum number of iterations.
tol (float) – Tolerance value used for convergence criterion when comparing correlations of consecutive runs.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
shift_source (bool) – Whether to shift the reconstructed sources by the optimal time lag found during training. Defaults to True.
- fit(
- X: DataArray,
- Y: DataArray,
- featureX_name: str = 'channel',
- featureY_name: str = 'channel',
Train mSPoC model on the X, Y dataset
Implement the pseudo-code of Algorithm 1 of Dähne et al. [DBM+13] for a single component pair. After training, the filter attributes Wx, Wy, and Wt are updated.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
featureX_name (str) – Name of the feature dimension for X.
featureY_name (str) – Name of the feature dimension for Y.
- transform(
- X: DataArray,
- Y: DataArray,
Get reconstructed sources of the X and Y dataset.
The X component is constructed by computing the bandpower of the X projection along Wx, and then applying a liner temporal filter using Wt. The Y component is constructed as the linear projection of Y along Wy.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
- A tuple (Sx, Sy) where:
Sx (DataArray): Reconstructed source of modality X. Sy (DataArray): Reconstructed source of modality Y.
- Return type:
tuple
- one_unit_algorithm(Y, Cxx, Cxxe, tCxxe, time)[source]
Run the one-unit algorithm of mSPoC to compute one single set of filters.
- deflate_data(x, w)[source]
Deflate data by removing from x the contribution of the projection on w.
- get_bandpower(W, C)[source]
Compute bandpower with temporal embedding.
It estimates the bandpower of a signal by computing the variance within epochs.
- validate_inputs_fit(
- X: DataArray,
- Y: DataArray,
Validates input data of fit function and returns it with the correct dimensions and labels.
This method ensures that the input data to the fit function, X and Y, i.e. those used for training, have the expected dimension labels and sizes and returns them with the dimensions ordered as (sample_name, feature_name). It also initializes the number of samples, and features.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
- A tuple (X, Y) where:
X (DataArray): Input data for modality X ordered as (sample_name, featureX_name). Y (DataArray): Input data for modality Y ordered as (sample_name, featureY_name).
- Return type:
tuple
- validate_inputs_transform(
- X: DataArray,
- Y: DataArray,
Validates that the to-be-transformed data have the expected dimension labels and sizes.
This method ensures that X and Y have the same dimension labels and number of features than the ones used during training.
- Parameters:
X (DataArray) – Input data for modality X.
Y (DataArray) – Input data for modality Y.
- Returns:
- A tuple (X, Y) where:
X (DataArray): Input data for modality X ordered as (sample_name, featureX_name). Y (DataArray): Input data for modality Y ordered as (sample_name, featureY_name).
- Return type:
tuple
- convert_filters_to_DataArray(
- Wx: ndarray,
- Wy: ndarray,
- Wt: ndarray,
- X: DataArray,
- Y: DataArray,
Convert filters Wx, Wy, Wt in numpy array format to DataArray with right dimensions and coordinates.
- Parameters:
Wx (ndarray) – Filter matrix for modality X with shape (Nx, N_components).
Wy (ndarray) – Filter matrix for modality Y with shape (Ny, N_components).
Wt (ndarray) – Filter matrix for time embedding with shape (N_shifts, N_components).
X_features (DataArray) – DataArray containing the features of modality X.
Y_features (DataArray) – DataArray containing the features of modality Y.
- Returns:
A tuple containing the DataArray versions of Wx, Wy, and Wt.
- Return type:
tuple[DataArray, DataArray, DataArray]
- cedalion.sigdecomp.multimodal.mspoc.temporal_embedding(
- X: ndarray,
- time_shifts: ndarray,
- time: ndarray,
Construct a time-embedded version of a matrix X.
- Parameters:
X (ndarray) – Matrix to embed in time.
time_shifts (ndarray) – Array of time shifts to consider.
time (ndarray) – Array of time points.
- Returns:
Time-embedded version of X.
- Return type:
ndarray
- cedalion.sigdecomp.multimodal.mspoc.get_orthonormal_matrix(W: ndarray) ndarray [source]
Generate an orthonormal basis for an N-dimensional space where the columns of W are some of the basis vectors.
- Parameters:
W (np.ndarray) – A N x Nc array representing the given vectors.
- Returns:
An N x N - Nc orthonormal basis matrix, where the columns of W are not present.
- Return type:
basis (np.ndarray)
cedalion.sigdecomp.multimodal.tcca_models module
Module for temporally embedded CCA-like models. The temporally embedded technique is based on Bießmann et al. [BMG+10]
- class cedalion.sigdecomp.multimodal.tcca_models.MultimodalSourceDecompositionWithTemporalEmbedding(
- N_components: int = None,
- time_shifts: ndarray = None,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
- shift_source=True,
Bases:
MultimodalSourceDecomposition
Class for decomposing multimodal data (X and Y) into latent sources using linear filters, using temporal embedding.
This main class is inherited by other source decomposition methods, such as tCCA. It implements methods to validate input dimensions, apply normalization, and transform data from two modalities using filters learned during training. It assumes modality Y is delayed with respect to modality X.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
time_shifts (np.ndarray) – Array with time shifts to be used for temporal embedding.
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
shift_source (bool) – Whether to shift the reconstructed sources by the optimal time shift found during training. Defaults to True.
- estimate_optimal_shift(
- X_emb: DataArray,
- Y: DataArray,
Find optimal time shifts for X, one per component.
It finds the optimal time shift for each component by looking for the time-shifted X that produces the biggest correlation between reconstructed sources sx and sy.
- Parameters:
X_emb (DataArray) – Time-embedded version of X with dimensions (time_shift, sample_name, featureX_name).
Y (DataArray) – Input data for modality Y with dimensions (sample_name, featureY_name).
- shift_by_optimal(
- X: DataArray,
Shift X by optimal time shift using zero padding.
- transform(
- X: DataArray,
- Y: DataArray,
Apply the linear transformation on the input data using learnt filters.
This method validates the dimension labels and sizes of the input data to ensure consistency with the training data, perform temporal embedding on X, applies normalization using the stored parametersand, and then projects the normalized data onto a lower-dimensional space using the learned filters Wx and Wy. It retrieves the transformed arrays, a.k.a reconstructed sources.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
- Returns:
- A tuple (X_new, Y_new) where:
X_new (DataArray): Transformed data for modality X. Y_new (DataArray): Transformed data for modality Y.
- Return type:
tuple
- class cedalion.sigdecomp.multimodal.tcca_models.ElasticNetTCCA(
- N_components: int = None,
- l1_reg: float | list[float, float] = 0,
- l2_reg: float | list[float, float] = 0,
- time_shifts=None,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
- pls: bool = False,
- shift_source=True,
Bases:
MultimodalSourceDecompositionWithTemporalEmbedding
Perform Elastic Net Canonical Correlation Analysis (CCA) between X_emb and Y.
Apply temporally embedded CCA (tCCA) with L1 + L2 regularization, a.k.a elastic net. The algorithm finds sparse (L1) and normalized (L2) vectors Wx, and Wy as the solution to the following constrained optimization problem:
maximize Wx^T Cxy Wy subject to Wx^T Cx Wx = 1, Wy^T Cy Wy = 1,
||Wx||_1 <= c1x, ||Wy||_1 <= c1y, ||Wx||^2_2 <= c2x, ||Wy||^2_2 <= c2y
where Cx, Cy, and Cxy are the individual and cross-covariance matrices between X_emb and Y datasets, and the last four constraints correspond to the standard L1-norm and L2-norm penalization terms. c1x and c1y controls sparsity while c2x and c2y controls the magnitude of the vectors. PLS algorithms are also captured by this algorithm by sending Cx and Cy to the identity matrices.
The temporally embedded matrix X_emb is constructed by concatenating time-shifted versions of the original X. Y is assumed to be the delayed signals with respect to X so time shifts are always positive.
For the one-unit algorithm (sparse) SVD decomposition is performed on the whitened cross-covariance matrix K = Cx^(-1/2) Cxy Cy^(-1/2) (reduced to K = Cxy for PLS), using the following standard alternating power method (based on Parkhomenko et al. [PTB09]):
- Update u:
u <- K * v
u <- u / ||u||
- If L1:
u <- SoftThresholding(u, lambda_u/2) u <- u / ||u||
- Update v:
v <- K^T * u
v <- v / ||v||
- If L1:
v <- SoftThresholding(v, lambda_v/2) v <- v / ||v||
The resulting u and v are the leading left and right singular vectors of K which are nothing but individual components of the filters Wx and Wy. The softthresholding function bring some components to zero. If L2 regularization is used, prior to computing K, Cx and Cy are shifted by Cx <- Cx + alpha_x I and Cy <- Cy + alpha_y I.
Multiple components are obtained via a deflation method, subtracting from K its 1-rank approximation on each iteration. The returned vectors Wx and Wy are ordered in desceding order w.r.t. the singular values, which coincide with the canonical correlations.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
l1_reg (float or list of floats) – list containing lambda_u and lambda_v (see above). If a single float is provided,
lambda_v. (then lambda_u =)
l2_reg (float or list of floats) – list containing alpha_x and alpha_y (see above). If a single float is provided,
alpha_y. (then alpha_x =)
time_shifts (np.ndarray) – Array with time shifts to be used for temporal embedding.
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
pls (bool) – Whether to perform PLS regression. Defaults to False.
shift_source (bool) – Whether to shift the reconstructed sources by the optimal time shift found during training.
- Wx[source]
Linear filters for dataset X with dimensions (featureX_name, latent_featureX_name)
- Type:
DataArray
- Wy[source]
Linear filters for dataset Y with dimensions (featureY_name, latent_featureY_name).
- Type:
DataArray
- fit(
- X: DataArray,
- Y: DataArray,
- featureX_name: str = 'channel',
- featureY_name: str = 'channel',
Find the canonical vectors Wx, and Wy for the datasets X and Y.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
featureX_name (str, optional) – Label for X-feature dimension, set to ‘channel’ by default.
featureY_name (str, optional) – Label for Y-feature dimension, set to ‘channel’ by default.
- class cedalion.sigdecomp.multimodal.tcca_models.StructuredSparseTCCA(
- N_components: int = None,
- Lx=None,
- Ly=None,
- time_shifts=None,
- l1_reg: float | list[float, float] = 0,
- l2_reg: float | list[float, float] = 0,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
- shift_source: bool = True,
- pls: bool = False,
Bases:
MultimodalSourceDecompositionWithTemporalEmbedding
Perform structured sparse Canonical Correlation Analysis (ssCCA) between two datasets X_emb and Y.
The sstCCA algorithm is a temporally embedded extension of Chen et al. [CBL+13], and it assumes the underlying X and Y features are linked through a graph structure. It finds sparse (L1) vectors Wx, and Wy as the solution to the following constrained optimization problem:
maximize Wx^T Cxy Wy subject to Wx^T Cx Wx = 1, Wy^T Cy Wy = 1,
||Wx||_1 <= c1x, ||Wy||_1 <= c1y, Wx^T Lx Wx <= c2x, Wy^T Ly Wy <= c2y
where Cx, Cy, and Cxy are the individual and cross-covariance matrices between X_emb and Y datasets. The second constraint is the standard L1-norm penalization term, while the last constraint incorporates local information of the spatial distribution of the features trough the Laplacian matrices Lx and Ly. These terms encaurage filter components that are linked on the graphical structure to have similar values, making them to vary smoothly across the graph. The c1x and c1y controls sparsity while c2x and c2y controls the relative importante of the graph structure.
The temporally embedded matrix X_emb is constructed by concatenating time-shifted versions of the original X. Y is assumed to be the delayed signals with respect to X so time shifts are always positive.
For the one-unit algorithm, first Cx and Cy are shifted by Cx <- Cx + alpha_x Lx and Cy <- Cy + alpha_y Ly, and then SVD decomposition is performed on the whitened cross-covariance matrix K = Cx^(-1/2) Cxy Cy^(-1/2), using the following standard alternating power method (based on Parkhomenko et al. [PTB09]):
- Update u:
u <- K * v
u <- u / ||u||
- If L1:
u <- SoftThresholding(u, lambda_u/2) u <- u / ||u||
- Update v:
v <- K^T * u
v <- v / ||v||
- If L1:
v <- SoftThresholding(v, lambda_v/2) v <- v / ||v||
The resulting u and v are the leading left and right singular vectors of K which are nothing but individual components of the filters Wx and Wy. The softthresholding function bring some components to zero.
Multiple components are obtained via a deflation method, subtracting from K its 1-rank approximation on each iteration. The returned vectors Wx and Wy are ordered in desceding order w.r.t. the singular values, which coincide with the canonical correlations.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
Lx (ndarray) – Laplacian matrix for modality X.
Ly (ndarray) – Laplacian matrix for modality Y.
time_shifts (np.ndarray) – Array with time shifts to be used for temporal embedding.
l1_reg (float or list of floats) – list containing lambda_u and lambda_v (see above). If a single float is provided,
lambda_v. (then lambda_u =)
l2_reg (float or list of floats) – list containing alpha_x and alpha_y (see above). If a single float is provided,
alpha_y. (then alpha_x =)
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
pls (bool) – Whether to perform PLS regression. Defaults to False.
shift_source (bool) – Whether to shift the reconstructed sources by the optimal time shift found during training.
- Wx[source]
Linear filters for dataset X with dimensions (featureX_name, latent_featureX_name)
- Type:
DataArray
- Wy[source]
Linear filters for dataset Y with dimensions (featureY_name, latent_featureY_name).
- Type:
DataArray
- fit(
- X: DataArray,
- Y: DataArray,
- featureX_name: str = 'channel',
- featureY_name: str = 'channel',
Find the canonical vectors Wx, and Wy for the datasets X and Y.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
featureX_name (str, optional) – Label for X-feature dimension, set to ‘channel’ by default.
featureY_name (str, optional) – Label for Y-feature dimension, set to ‘channel’ by default.
- class cedalion.sigdecomp.multimodal.tcca_models.tCCA(
- N_components: int = None,
- time_shifts: ndarray = None,
- max_iter: int = 100,
- tol: float = 1e-06,
- scale: bool = True,
- shift_source=True,
Bases:
ElasticNetTCCA
Perform tCCA between two datasets X and Y.
This algorithm is a particular case of the one implemented in the ElasticNetTCCA class. See there for a detailed explanation of the algorithm.
- Parameters:
N_components (int, optional) – Number of components to extract. If None, the number of components is set to the minimum number of features between modalities.
time_shifts (np.ndarray) – Array with time shifts to be used for temporal embedding.
max_iter (int) – Maximum number of iterations for the algorithm.
tol (float) – Tolerance for convergence.
scale (bool) – Whether to scale the data during normalization to unit variance. Defaults to True.
shift_source (bool) – Whether to shift the reconstructed sources by the optimal time shift found during training.
- Wx[source]
Linear filters for dataset X with dimensions (featureX_name, latent_featureX_name)
- Type:
DataArray
- cedalion.sigdecomp.multimodal.tcca_models.temporal_embedding(
- X: DataArray,
- time_shifts: ndarray,
Construct a time-embedded version of a matrix X.
If X has shape T x N, the embedding matrix X_emb has shape P x T x N, where P is the number of time shifts in time_shifts. The embedding matrix is built by concatenating time-shift versions of the original matrix using the shifts inside time_shifts. Zero padding is used at the beggining of each time-shifted copy to preserve the original length of time direction.
- Parameters:
X (DataArray) – Input data with at least a time dimension.
time_shifts (np.ndarray) – Array with time shifts to be used for temporal embedding.
- Returns:
Time-embedded version of X with dimensions (time_shift, time, …).
- Return type:
DataArray
cedalion.sigdecomp.multimodal.utils_multimodal_models module
Utility functions for multimodal models.
- cedalion.sigdecomp.multimodal.utils_multimodal_models.validate_dimension_labels(
- X: DataArray,
- Y: DataArray,
- sample_name: str,
- featureX_name: str,
- featureY_name: str,
Validates that X and Y datasets contain the expected dimension labels.
This method checks that the data arrays X and Y contain the same sample label and corresponding X and Y feature labels.
- Parameters:
X (DataArray) – Input data for modality X. Expected to have dimensions (sample_name, featureX_name).
Y (DataArray) – Input data for modality Y. Expected to have dimensions (sample_name, featureY_name).
sample_name (str) – Name of the sample dimension.
featureX_name (str) – Name of the feature dimension for X.
featureY_name (str) – Name of the feature dimension for Y.
- Raises:
ValueError – If X or Y are missing the expected feature dimension labels.
- cedalion.sigdecomp.multimodal.utils_multimodal_models.validate_dimension_sizes(Ntx: int, Nty: int, N_features: int, N_components: int) int [source]
Validates dimension sizes of multimodal data.
Takes the sample dimension sizes corresponding to X and Y datasets, Ntx and Nty, the number of features and the number of components and corroborate the they are consistent among themselves.
- Parameters:
Ntx (int) – Number of samples in X.
Nty (int) – Number of samples in Y.
N_features (int) – Number of features in X and Y.
N_components (int) – Number of components to extract.
- Returns:
updated number of components.
- Return type:
N (int)
- Raises:
ValueError – If X or Y do not have the expected dimensions, if the number of samples between X and Y is inconsistent, or if the number of components exceeds the number of features.
- cedalion.sigdecomp.multimodal.utils_multimodal_models.validate_l1_reg(l1_reg: float | list[float, float]) list[float, float] [source]
Check correct format of L1 regularization parameter.
- Parameters:
l1_reg (float or list of floats) – L1 regularization parameter(s) for sparsity. If a scalar, the same value is applied to both u and v; if a list of two values, the first is used for u and the second for v.
- Returns:
- List with L1 regularization parameters, the first is used for u
and the second for v.
- Return type:
list of floats
- cedalion.sigdecomp.multimodal.utils_multimodal_models.validate_l2_reg(l2_reg: float | list[float, float]) list[float, float] [source]
Check correct format of L2 regularization parameter.
- Parameters:
l2_reg (float or list of floats) – L2 regularization parameter(s) for normalization. If a scalar, the same value is applied to both u and v; if a list of two values, the first is used for u and the second for v.
- Returns:
- List with L2 regularization parameters, the first is used for u
and the second for v.
- Return type:
list of floats
- cedalion.sigdecomp.multimodal.utils_multimodal_models.validate_time_shifts(T: float, time_shifts: ndarray) ndarray [source]
Corroborate that time shifts have the right format and are within the data domain.
This method checks that the time shifts are positive and within the data domain. It also order the shifts in ascending order and adds zero lag at the beginning of the series if not present.
- Parameters:
T (float) – Maximum time of the data.
time_shifts (np.ndarray) – Array of time shifts to consider.
- Returns:
- A tuple (time_shifts, N_shifts) where:
time_shifts (np.ndarray): Array of ordered, positive time shifts.
- Return type:
tuple
- cedalion.sigdecomp.multimodal.utils_multimodal_models.standardize(
- x: DataArray,
- dim: str = 'time',
- scale: bool = True,
Standardize x along dimension dim.
It standardizes the input data x along the specified dimension dim by removing the mean value and scaling to unit variance (if scale=True).
- Parameters:
x (DataArray) – Input data to standardize.
dim (str) – Dimension to standardize along.
scale (bool) – Whether to scale the data to unit variance.
- Returns:
- A tuple (x_standard, mean, std) where:
x_standard (DataArray): Standardized version of x. mean (DataArray): Mean value of x along dimension dim. std (DataArray): Standard deviation of x along dimension dim.
- Return type:
tuple
Module contents
Multimodal signal decomposition methods.