Getting started with contributing code to Cedalion

This document is a brief guide for contributors who would like to add code or new functionality to Cedalion. The toolbox is designed to be useful both for researchers who apply existing workflows and for developers who build new methods. Because the codebase is still growing, we follow a bottom-up approach: write simple functions with clear inputs and outputs first, then wrap them in higher-level abstractions once the API has stabilised. Develop and test in Jupyter notebooks (which can later be contributed as examples), then migrate mature code into src/cedalion/.

Where to get started

Familiarise yourself with these five resources before contributing:

Documentation: The rendered documentation is at doc.ibs.tu-berlin.de/cedalion/doc/dev/.
Example notebooks: Located in examples/, grouped by topic, and viewable in rendered form on the documentation site. Running and modifying existing notebooks is the fastest way to understand the API.
xarray: Cedalion’s primary data container. Familiarise yourself with xarray.DataArray and xarray.Dataset. All processing functions accept and return DataArrays. See the xarray documentation.
pint units: Cedalion tracks physical units using pint via pint-xarray. Import units with from cedalion import Quantity, units and attach them to variables, e.g. sd_distance = 3 * units.cm. Functions should accept pint.Quantity arguments wherever a physical unit is meaningful.
Data containers: The main container is Recording, which holds named timeseries DataArrays, optode positions, stimulus events, and optional head model data. The code snippet below shows how to load a recording and access its contents:

import cedalion
import cedalion.nirs.cw as nirs
import xarray as xr
from cedalion import units

rec = cedalion.data.get_fingertapping()   # returns a Recording
amp = rec.timeseries["amp"]              # raw amplitude, dims: (channel, wavelength, time)

od = nirs.int2od(amp)
dpf = xr.DataArray(
    [6.0, 6.0],
    dims="wavelength",
    coords={"wavelength": amp.wavelength},
)
conc = nirs.od2conc(od, rec.geo3d, dpf)  # HbO / HbR concentration, dims: (channel, chromo, time)

General rules and overview

Style guide for Python code

We follow PEP 8. The Ruff linter is configured in pyproject.toml and enforces rules E, F, W, and D (Google docstrings). Run ruff check src/ tests/ before committing.

Key conventions:

Functions and variables: snake_case — e.g. def example_function(), my_variable = 1.
Classes: PascalCase — e.g. class MyProcessor.
Module-level constants: UPPER_CASE — e.g. MAX_OVERFLOW = 100.
Max line length: 88 characters.

Style guide for docstrings

Use Google-style docstrings for all public functions, classes, and methods. Include Args:, Returns:, and Raises: sections where applicable.

Example — simple function:

def func(arg1: int, arg2: str) -> bool:
    """One sentence description of function.

    Some more details on what the function does.

    Args:
        arg1: Description of arg1.
        arg2: Description of arg2.

    Returns:
        Description of return value.
    """
    return True

Example — function with typed scientific arguments:

def func(
    arg1: cdt.NDTimeSeries,
    arg2: cdt.NDTimeSeries,
    arg3: Quantity,
) -> cdt.NDTimeSeries:
    """Implements algorithm XY based on :cite:t:`BIBTEXLABEL`.

    Some more details on what the function does.

    Args:
        arg1 (:class:`NDTimeSeries`, (channel, wavelength, time)): Description of
            first argument. For NDTimeSeries, specify expected dimensions.
        arg2 (:class:`NDTimeSeries`, (time, *)): Algorithms that operate only along
            one dimension (e.g. frequency filtering) are agnostic to other dimensions.
            Document this with ``(time, *)``.
        arg3 (:class:`Quantity`, [time]): Parameters with physical units should be
            passed as pint Quantities. Document the expected dimensionality.

    Returns:
        Description of return value.
    """
    return True

Literature references

Add references to cedalion/bibliography/references.bib with a unique BibTeX label. Cite them in docstrings with:

:cite:t:`BIBTEXLABEL`

In Markdown notebook cells, cite with:

{cite:t}`BIBTEXLABEL`

In functions, call:

cedalion.cite(BIBTEXLABEL)

All references are listed on the References page. If citations do not render correctly, check the Sphinx output for duplicate or malformed entries in references.bib.

Each call to cedalion.cite is recorded in the container cedalion.bib. The user can obtain references for all used methods by executing cedalion.bib.dump_to_notebook at the end of a notebook.

Where to add code

Incorporate new code into the existing module structure. Python files (modules) contain functions of the same category; directories (subpackages) contain related modules. For example:

Two artefact correction methods SplineSG and tPCA both belong in sigproc/motion.py.
motion.py and quality.py both belong in sigproc/.

Only create a new module or subpackage if the code genuinely does not fit anywhere existing. If in doubt, open an issue or pull request for discussion first.

GitHub workflow

Cedalion uses GitHub for version control and code review. The dev branch is always the most up-to-date integration branch — always branch off dev, never off main.

Fork or clone the repository and make sure your local dev is up to date:
```
git checkout dev
git pull upstream dev
```
Create a feature branch from dev:
```
git checkout -b feature/my-new-feature
```
Use a short, descriptive name prefixed with feature/ for new functionality or fix/ for bug fixes.
Develop, test, and lint your changes:
```
ruff check src/ tests/
pytest tests/
```
Open a pull request on GitHub targeting the dev branch (not main). Describe what the PR does and link any related issues. The code will be merged after review.

main reflects the latest stable release and is only updated by maintainers during a release process. Direct commits to main or dev are not accepted.

File and folder structure

The repository is organised into four top-level directories:

parent_directories

docs: Sphinx documentation source.
examples: Jupyter notebooks grouped by topic. Each notebook should be self-contained and annotated for a researcher new to the API. Add a notebook here whenever you introduce significant new functionality.
src: The library source code under src/cedalion/.
tests: pytest unit tests mirroring the src/cedalion/ structure.

The src/cedalion/ directory is organised as follows:

Directory	Purpose
data	Lookup tables and small bundled datasets.
dataclasses	Core containers: `Recording`, `Surface`, `PointType`, xarray schemas, and the `.cd` accessor.
geometry	3D geometry: optode registration, head segmentation, meshing, landmarks.
dot	DOT image reconstruction pipeline.
io	Reading and writing SNIRF, BIDS, probe geometries, anatomies, and other formats.
models	Data modelling, e.g. the General Linear Model (`models.glm`).
nirs	NIRS physics: `cw` (continuous wave), `fd` (frequency domain), `td` (time domain), `common` (extinction coefficients, channel distances).
sigdecomp	Signal decomposition methods not in standard libraries (ICA variants, CCA, mSPoC).
sigproc	Time-series signal processing: filtering, motion artefact correction, quality assessment, epoch extraction.
sim	Simulation and data augmentation: synthetic HRFs, artefacts, and toy datasets.
vis	Visualisation utilities.

Example: contributing new functionality

As a worked example we will add a channel quality check and pruning step — the kind of thing found in src/cedalion/sigproc/quality.py.

Decide where the code belongs

After browsing src/, we decide that quality-assessment helpers belong in sigproc/quality.py. Once added, users import them with:

import cedalion.sigproc.quality as quality

Bare-bones function template

import cedalion.dataclasses as cdc
import cedalion.typing as cdt
from cedalion import Quantity, units


@cdc.validate_schemas
def function_name(timeseries: cdt.NDTimeSeries, threshold: Quantity):
    """Short one-line summary.

    Args:
        timeseries: Input fNIRS data with at least ``channel`` and ``time`` dims.
        threshold: Quality threshold with appropriate units.

    Returns:
        Description of the return value.
    """

    # YOUR CODE

    return something

The @cdc.validate_schemas decorator checks that timeseries matches the NDTimeSeries schema (requires at least channel and time dimensions) and raises a descriptive error at runtime if the input does not match.

Worked example: SNR check and channel pruning

The current quality module provides snr, sci, and prune_ch. Here is how they fit together — illustrating the pattern you should follow for new metrics:

import cedalion.sigproc.quality as quality
from cedalion import units

# 1. Compute individual quality metrics (each returns a value array and a boolean mask)
snr, snr_mask = quality.snr(amp, snr_thresh=2.0)
sci, sci_mask = quality.sci(amp, window_length=5 * units.s, sci_thresh=0.7)

# 2. Combine masks and drop failing channels
#    operator="all"  →  keep channel only if it passes every mask
#    operator="any"  →  keep channel if it passes at least one mask
amp_pruned, dropped = quality.prune_ch(amp, [snr_mask, sci_mask], operator="all")

print("Dropped channels:", dropped)

Both snr_mask and sci_mask are boolean DataArrays (True = clean, False = tainted). prune_ch combines them and removes channels that fail. Add a new metric by writing a function that returns (metric, mask) with the same convention — it will plug directly into prune_ch.

For the full current implementation, see sigproc/quality.py.

Creating example notebooks

After adding a feature, create a Jupyter notebook in the appropriate subfolder of examples/. To set the thumbnail shown in the examples gallery, add the tag nbsphinx-thumbnail to the cell that produces the representative figure (the IBS logo is used by default if no tag is set).

Concluding remarks

While the toolbox continues to grow, we will add containers and abstraction layers to simplify usage. Whenever the existing environment does not yet provide the level of abstraction you need, develop bottom up: write simple functions with clear inputs and outputs. Once a general container ties together the relevant data, it is straightforward to refactor. We are working on higher-level pipeline mechanisms that will make it easy to assemble lower-level functions into reusable workflows.