Using Channel Variance as Proxy for Measurement Noise and as a Weight for Global Physiology Removal

To improve statistics, channel pruning might not always be the way. An alternative is to use channel weights in the calculation of averages (e.g. across subjects) or image reconstruction. One way of weighting channels is by their estimated measurement noise. Variance can be a proxy of measurement noise, e.g. when calculated across trials of the same condition (within subject) or across time on the residual after GLM fit. This notebook is WIP to provide help to explore this approach with a helper function (quality.measurement_variance) for this purpose. We will first create an intuition how to use the quality.measurement_variance function, and then use the output for weighted global physiology removal with physio.global_physio_subtract.

[1]:

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np

import cedalion
import cedalion.datasets as datasets
import cedalion.nirs
import cedalion.sigproc.quality as quality
import cedalion.sigproc.motion_correct as motion_correct
from cedalion import units
import cedalion.xrutils as xrutils

[2]:

# some plotting helper functions for this notebook

import xarray as xr

def plot_heatmap(da, cov_wavelength=None, figsize=(12, 4), cmap=None):
    dims = da.dims

    # VARIANCE CASE: dims = ("channel", "wavelength")
    if set(dims) == {"channel", "wavelength"}:
        # Convert to pandas DataFrame so that rows = channels, cols = wavelengths
        df = da.to_pandas()

        # We want channels on the x-axis, wavelengths on the y-axis.
        #   df.values has shape (n_channels, n_wavelengths), so transpose → (n_wavelengths, n_channels)
        arr = df.values.T

        x_labels = df.index.tolist()         # channel names
        y_labels = [str(int(wl)) for wl in df.columns]  # wavelength values as strings

        x_dim_name = "channel"
        y_dim_name = "wavelength"
        cbar_label = "Variance"

    # COVARIANCE CASE: dims = ("wavelength", "channel1", "channel2")
    elif set(dims) == {"wavelength", "channel1", "channel2"}:
        if cov_wavelength is None:
            raise ValueError(
                "When da.dims == ('wavelength','channel1','channel2'), you must supply cov_wavelength."
            )
        # Extract the 2D slice at that wavelength
        da2d = da.sel(wavelength=cov_wavelength)
        # Make sure dims are in order (channel1, channel2)
        da2d = da2d.transpose("channel1", "channel2")

        arr = da2d.values  # shape = (n_channel1, n_channel2)

        x_labels = da2d.coords["channel2"].values.tolist()
        y_labels = da2d.coords["channel1"].values.tolist()

        x_dim_name = "channel2"
        y_dim_name = "channel1"
        cbar_label = f"Covariance (λ={cov_wavelength})"

    else:
        raise ValueError(f"Unsupported DataArray dimensions: {dims}")

    # Plot the 2D array with imshow
    fig, ax = plt.subplots(figsize=figsize)
    im = ax.imshow(arr, aspect="auto", cmap=cmap)

    # Set x-axis ticks/labels
    ax.set_xticks(range(len(x_labels)))
    ax.set_xticklabels(x_labels, rotation=90, fontsize=8)

    # Set y-axis ticks/labels
    ax.set_yticks(range(len(y_labels)))
    ax.set_yticklabels(y_labels, fontsize=8)

    # Label axes from the dimension names
    ax.set_xlabel(x_dim_name)
    ax.set_ylabel(y_dim_name)

    # Add a colorbar
    cbar = fig.colorbar(im, ax=ax)
    cbar.set_label(cbar_label)

    plt.tight_layout()
    return fig, ax



def plot_selected_channels(
    rec: xr.Dataset,
    channels: list,
    wavelength: float,
    da_name: str = "od",
    figsize: tuple = (12, 4),
    time_xlim: tuple = (0, 500)
):

    fig, ax = plt.subplots(1, 1, figsize=figsize)

    for ch in channels:
        series = rec[da_name].sel({ "channel": ch, "wavelength": wavelength })
        ax.plot(rec[da_name].time, series, label=f"{ch} {wavelength}nm")

    ax.legend()
    ax.set_xlim(*time_xlim)
    ax.set_xlabel("time / s")
    ax.set_ylabel("Signal intensity / a.u.")
    plt.show()

(Weighted) Global Physiology Removal

[12]:

from cedalion.sigproc.physio import global_component_subtract

[13]:

import xarray as xr

# just another helper function to make the relevant things below easier to read
def plot_channel_wavelength(
    rec: xr.Dataset,
    dname: str,
    diff: dict,
    global_comp: xr.DataArray,
    channel: str,
    wavelength: float
):
    f, ax = plt.subplots(1, 1, figsize=(12, 4))

    # Original signal
    ax.plot(
        rec["od"].time,
        rec["od"].sel({ "channel": channel, "wavelength": wavelength }),
        "b-",
        label=f"{channel} {wavelength}nm (raw)"
    )

    # Corrected signal
    ax.plot(
        rec[dname].time,
        rec[dname].sel({ "channel": channel, "wavelength": wavelength }),
        "g-",
        label=f"{channel} {wavelength}nm (corrected)"
    )

    # Global component
    ax.plot(
        global_comp.time,
        global_comp.sel({ "wavelength": wavelength }),
        "y-",
        label=f"Global Component {wavelength}nm"
    )

    # Difference (raw – corrected)
    ax.plot(
        rec["od"].time,
        diff[dname].sel({ "channel": channel, "wavelength": wavelength }),
        "r-",
        label="Difference (raw − corrected)"
    )

    ax.legend()
    ax.set_xlim(100, 200)
    ax.set_xlabel("time / s")
    ax.set_ylabel("Signal intensity / a.u.")
    plt.show()

First we get the original data and highpass filter it to remove slow drifts

[14]:

from cedalion.sigproc import frequency

# Refresh Data
rec["od"] = cedalion.nirs.int2od(rec["amp"])

# highpass filter data to remove slow drifts
rec["od"] = frequency.freq_filter(rec["od"], fmin=0.01*units.Hz, fmax=2*units.Hz, butter_order=4)

# initialize empty dictionary
diff = {}

(Fitted) Global Mean Subtraction

We can use global_physio_subtract to remove the global average signal from each channel/vertex/voxel.

[15]:

dname = "od_corr_gm"

rec[dname], global_comp = global_component_subtract(rec["od"], ts_weights=None, k=0)
diff[dname] = rec["od"] - rec[dname]


# plot results for channel S1D2 at 760nm
plot_channel_wavelength(
    rec=rec,
    dname=dname,
    diff=diff,
    global_comp=global_comp,
    channel="S1D2",
    wavelength=760.0
)

../../_images/examples_signal_quality_24_downweighting_noisy_channels_29_0.png

Weighted Global Mean Subtraction

Since some channels might have a lot of artifacts or are noisy, we can use the variance as proxy for channel measurement noise from above in this notebook, to downweight noisy channel in the global mean subtraction

[16]:

od_var = quality.measurement_variance(rec["od"], calc_covariance=False)

dname = "od_corr_wgm"
rec[dname], global_comp = global_component_subtract(rec["od"], ts_weights=1/od_var, k=0)
diff[dname] = rec["od"] - rec[dname]


# plot results for channel S1D2 at 760nm
plot_channel_wavelength(
    rec=rec,
    dname=dname,
    diff=diff,
    global_comp=global_comp,
    channel="S1D2",
    wavelength=760.0
)

../../_images/examples_signal_quality_24_downweighting_noisy_channels_31_0.png

Remove exactly the first Principal Component (unweighted)

Instead of the global mean we can also use PCA to find and remove global components.

[17]:

dname = "od_corr_1pc"
rec[dname], global_comp = global_component_subtract(rec["od"], ts_weights=None, k=1)
diff[dname] = rec["od"] - rec[dname]


# plot results for channel S1D2 at 760nm
plot_channel_wavelength(
    rec=rec,
    dname=dname,
    diff=diff,
    global_comp=global_comp,
    channel="S1D2",
    wavelength=760.0
)

../../_images/examples_signal_quality_24_downweighting_noisy_channels_33_0.png

Remove 1 PCA component but using measurement‐variance weights on the data

If we want we can also include the channel weights from above in the PCA-based global signal removal

[18]:

od_var = quality.measurement_variance(rec["od"], calc_covariance=False)
dname = "od_corr_w1pc"

rec[dname], global_comp = global_component_subtract(rec["od"], ts_weights= 1/od_var, k=1)
diff[dname] = rec["od"] - rec[dname]

# plot results for channel S1D2 at 760nm
plot_channel_wavelength(
    rec=rec,
    dname=dname,
    diff=diff,
    global_comp=global_comp,
    channel="S1D2",
    wavelength=760.0
)

../../_images/examples_signal_quality_24_downweighting_noisy_channels_35_0.png

Remove 95% of global variance (weighted)

Often we dont know how many components to remove exactly, but how much variance the components we want to remove should explain. We can use k<1 to indicate the percent of variance we want removed.

[19]:

dname = "od_corr_w0.95pc"
rec[dname], global_comp = global_component_subtract(rec["od"], ts_weights=1/od_var, k=0.95)
diff[dname] = rec["od"] - rec[dname]


# plot results for channel S1D2 at 760nm
plot_channel_wavelength(
    rec=rec,
    dname=dname,
    diff=diff,
    global_comp=global_comp,
    channel="S1D2",
    wavelength=760.0
)

../../_images/examples_signal_quality_24_downweighting_noisy_channels_37_0.png

Overall comparison of the effects of the shown approaches

Lastly lets look at the difference (raw-corrected) signals for all of the approaches. Note that the differences between methods can be much stronger for more noisy data (our dataset here is quite clean)

[20]:

# plots all signals for channel S1D2, 760nm in diff[dname] for all dnames and puts the dnames in the legend

f, ax = plt.subplots(1, 1, figsize=(12,4))

for dname in diff.keys():
    ax.plot(
        rec["od"].time,
        diff[dname].sel({ "channel": "S7D6", "wavelength": 760.0 }),
        label=dname
    )
ax.set_title("Difference between raw and corrected signals for channel S1D2, 760nm")
ax.legend()
ax.set_xlim(100, 200)
ax.set_xlabel("time / s")
ax.set_ylabel("Signal intensity / a.u.")
plt.show()

../../_images/examples_signal_quality_24_downweighting_noisy_channels_39_0.png

Using Channel Variance as Proxy for Measurement Noise and as a Weight for Global Physiology Removal

Channel Variance as a Proxy for Measurement Noise

Plain Channel Variance

Channel variance under consideration of flagged bad channels

Using Variance as Proxy for measurement Noise to Downweight Channels

Channel Covariance