Convert a fNIRS dataset to BIDS

Do not run all cells at once. Carefully read the comments before each code cell — some steps require you to manually modify certain files (e.g., the mapping CSV) before proceeding. Make sure all required edits are completed before continuing to the next step.

[1]:

# This cells setups the environment when executed in Google Colab.
try:
    import google.colab
    !curl -s https://raw.githubusercontent.com/ibs-lab/cedalion/dev/scripts/colab_setup.py -o colab_setup.py
    # Select branch with --branch "branch name" (default is "dev")
    %run colab_setup.py --branch "dev"
except ImportError:
    pass

[2]:

import os
import re
import shutil
from pathlib import Path
from tempfile import TemporaryDirectory

import pandas as pd
import snirf2bids as s2b
from seedir import seedir
from rich import print_json

from cedalion.data import get_snirf2bids_example_dataset
from cedalion.io import bids

Convert your own dataset or an example

When the constant DEMO_MODE is set to True, an example dataset is used. Set it to False and modify the notebook variables as described below to convert a different dataset.

[3]:

DEMO_MODE = True

Provide file paths and meta data

This notebook shows how to convert an fNIRS dataset into BIDS format. To use it, provide the following inputs:

Dataset Path: Folder containing the raw dataset.
Destination Path: Folder where the BIDS-compliant dataset will be saved.
Mapping CSV File: CSV file that defines the dataset structure and provides necessary details for BIDS conversion.
(Optional) extra_meta_data File: Additional metadata to include in the description.json file. You can use this google form or this website to create this file.
(Optional) participants.tsv / participants.json files. If you already have a participants.tsv/.json file and provide the link below, it will be used directly. Alternatively, if you have participant-level metadata saved in a CSV or Excel file, with the first column as the participant ID and the remaining columns as metadata (with appropriate headers) and you provide the link to it below, the script will convert it into properly formatted .tsv and .json files for BIDS.

[4]:

if DEMO_MODE:
    dataset_path, edited_mapping_df_path = get_snirf2bids_example_dataset()

    temporary_directory = TemporaryDirectory()
    destination_path = Path(temporary_directory.name)

    print(f"dataset_path    : {dataset_path}\ndestination_path: {destination_path}\n")
    seedir(dataset_path)

else:
    dataset_path = Path('path-to-your-dataset-folder')  # REQUIRED
    destination_path = Path('path-to-your-destination-bids-folder') # REQUIRED

Downloading file 'snirf2bids_example_dataset.zip' from 'https://doc.ibs.tu-berlin.de/cedalion/datasets/dev/snirf2bids_example_dataset.zip' to '/home/runner/.cache/cedalion/dev'.
Unzipping contents of '/home/runner/.cache/cedalion/dev/snirf2bids_example_dataset.zip' to '/home/runner/.cache/cedalion/dev/snirf2bids_example_dataset.zip.unzip'

dataset_path    : /home/runner/.cache/cedalion/dev/snirf2bids_example_dataset.zip.unzip/snirf2bids_example_dataset
destination_path: /tmp/tmpahpn2f41

snirf2bids_example_dataset/
├─02262024_1100_473/
│ └─2024-02-26_010/
│   ├─2024-02-26_014_config.json
│   ├─2024-02-26_014.snirf
│   ├─2024-02-26_014_lsl.tri
│   ├─2024-02-26_014_config.hdr
│   ├─2024-02-26_014_calibration.json
│   ├─2024-02-26_014_probeInfo.mat
│   ├─2024-02-26_014_description.json
│   ├─2024-02-26_014.wl2
│   ├─2024-02-26_014.wl1
│   └─digpts.txt
├─snirf2BIDS_mapping_edited.csv
├─02272024_1030_474/
│ └─2024-02-27_010/
│   ├─2024-02-27_010_config.json
│   ├─2024-02-27_010_config.hdr
│   ├─2024-02-27_010_calibration.json
│   ├─2024-02-27_010.wl1
│   ├─2024-02-27_010_probeInfo.mat
│   ├─2024-02-27_010.snirf
│   ├─2024-02-27_010_description.json
│   ├─2024-02-27_010.wl2
│   ├─2024-02-27_010_lsl.tri
│   └─digpts.txt
└─readme.txt

[5]:

extra_meta_data_path = Path('path-to-your-meta-data')  # OPTIONAL
extra_meta_data_path = extra_meta_data_path if extra_meta_data_path.exists() else None

mapping_df_path = bids.get_snirf2bids_mapping_csv(dataset_path)
display(mapping_df_path)

'/home/runner/.cache/cedalion/dev/snirf2bids_example_dataset.zip.unzip/snirf2bids_example_dataset/snirf2BIDS_mapping.csv'

[6]:

participants_tsv_file = Path('path-to-your-participants.tsv') # OPTIONAL
participants_json_file = Path('path-to-your-participants.json') # OPTIONAL

Please modify the mapping CSV file which is automatically created under you raw dataset folder.

By default, a mapping CSV file is generated under the main raw dataset folder using the get_snirf2bids_mapping_csv function. Before running the rest of the code, open this file, make any necessary edits, and save it. A valid mapping CSV must include all SNIRF files in your dataset, along with the following columns:

sub: Participant identifier
ses (optional): Session identifier
task: Task name or label
run (optional): Run number
acq (optional): Acquisition label
cond (optional): List of condition labels
cond_match (optional): List of matching condition values
duration (optional): Event duration in seconds

[7]:

if DEMO_MODE:
    # simulate user edits by replacing mapping_df_path with a prefilled one
    shutil.copy(edited_mapping_df_path, mapping_df_path)

mapping_df = pd.read_csv(mapping_df_path, dtype=str)
mapping_df.head(10)

[7]:

	current_name	sub	ses	task	run	acq	cond	cond_match	duration
0	02262024_1100_473/2024-02-26_010/2024-02-26_014	473	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN
1	02272024_1030_474/2024-02-27_010/2024-02-27_010	474	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN

The mapping table created below serves as a key component for organizing and processing your dataset. The ses, run, and acq columns are optional and can be set to None if not applicable. The current_name column contains the path to the SNIRF files in your dataset.

Looking for possible *_scan.tsv files

To ensure no important information (e.g., acquisition time) from the original dataset is lost, we will:

Search Subdirectories: Traverse through all subdirectories within the dataset.
Locate Existing Scan Files: Search for all *_scan.tsv files in the dataset.
Integrate into Mapping Table: Extract the relevant information from these files and add it to our mapping table.
Extracts acquisition time from SNIRF files if missing in the _scans.tsv file.

This approach ensures that any details, such as acquisition time, are retained and incorporated into the BIDS-compliant structure.

[8]:

mapping_df["filename_org"] = mapping_df["current_name"].apply(
    lambda x: os.path.basename(x))
scan_df = bids.search_for_acq_time_in_scan_files(dataset_path)

mapping_df = pd.merge(mapping_df, scan_df, on="filename_org", how="left")
mapping_df["acq_time"] = mapping_df.apply(
    bids.search_for_acq_time_in_snirf_files, axis=1, args=(dataset_path,)
)

mapping_df.head(10)

[8]:

	current_name	sub	ses	task	run	acq	cond	cond_match	duration	filename_org	acq_time
0	02262024_1100_473/2024-02-26_010/2024-02-26_014	473	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN	2024-02-26_014	2024-02-26 12:09:58
1	02272024_1030_474/2024-02-27_010/2024-02-27_010	474	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN	2024-02-27_010	2024-02-27 11:37:36

The acq_time information is retrieved from the original dataset’s *_scan.tsv files and integrated into the mapping table.

Looking for possible *_session.tsv files

Similar to *_scan.tsv files, we search for *_session.tsv files in the dataset path to capture additional session-level metadata, such as acquisition times. Any relevant information from these files is added to the mapping table to ensure all session details are preserved.

[9]:

session_df = bids.search_for_sessions_acq_time(dataset_path)
mapping_df = pd.merge(mapping_df, session_df, on=["sub", "ses"], how="left")

mapping_df.head(10)

[9]:

	current_name	sub	ses	task	run	acq	cond	cond_match	duration	filename_org	acq_time	ses_acq_time
0	02262024_1100_473/2024-02-26_010/2024-02-26_014	473	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN	2024-02-26_014	2024-02-26 12:09:58	NaN
1	02272024_1030_474/2024-02-27_010/2024-02-27_010	474	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN	2024-02-27_010	2024-02-27 11:37:36	NaN

Converting the dataset

Create BIDS Folder Structure

The goal of this section is to rename the SNIRF files according to the BIDS naming convention and place them in the appropriate directory under destination_path, following the BIDS folder structure.

Steps:

Generate new filenames: Create BIDS-compliant filenames for all SNIRF records.
Determine file locations: Identify the appropriate locations for these files within the BIDS folder hierarchy.

This process ensures that the dataset adheres to BIDS standards for organization and naming.

[10]:

mapping_df[["bids_name", "parent_path"]] = mapping_df.apply(
    bids.create_bids_standard_filenames, axis=1, result_type='expand')

mapping_df.head(10)

[10]:

	current_name	sub	ses	task	run	acq	cond	cond_match	duration	filename_org	acq_time	ses_acq_time	bids_name	parent_path
0	02262024_1100_473/2024-02-26_010/2024-02-26_014	473	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN	2024-02-26_014	2024-02-26 12:09:58	NaN	sub-473_task-ballsqueezing_nirs.snirf	sub-473/nirs
1	02272024_1030_474/2024-02-27_010/2024-02-27_010	474	NaN	ballsqueezing	NaN	NaN	NaN	NaN	NaN	2024-02-27_010	2024-02-27 11:37:36	NaN	sub-474_task-ballsqueezing_nirs.snirf	sub-474/nirs

To facilitate proper organization:

parent_path: Added to the mapping dataframe to define the location of each SNIRF file within destination_path.
bids_name: Specifies the new BIDS-compliant name for each file. In the following sections, we will rename all files to their corresponding bids_name and copy them to their designated parent_path.

[11]:

_ = mapping_df.apply(bids.copy_rename_snirf, axis=1, args=(dataset_path, destination_path))

Create BIDS specific files (e.g., _coordsystem.json)

In this step, we utilize the snirf2bids Python package to generate the necessary .tsv and .json files for the BIDS structure.

For every record, the following files will be created:

_coordsystem.json
_optodes.json
_optodes.tsv
*_channels.tsv
*_events.json
*_events.tsv
*_nirs.json

These files are essential for ensuring the dataset adheres to BIDS standards.

[12]:

s2b.snirf2bids_recurse(destination_path)
pattern = re.compile(r'.*_scans\.tsv$|^participants\.tsv$|^temp_participants\.tsv$')
files_to_delete = [file for file in destination_path.rglob('*') if file.is_file() and pattern.match(file.name)]
for file in files_to_delete:
    file.unlink()

Create _scan.tsv Files

Now, we proceed to create scan files for all subjects and sessions. Previously, we searched the original dataset path for any provided scan information, which will now be incorporated into the BIDS structure.

[13]:

scan_df = mapping_df[["sub", "ses", "bids_name", "acq_time"]].copy()
scan_df['ses'].fillna("Unknown", inplace=True)
scan_df = scan_df.groupby(["sub", "ses"])
scan_df.apply(lambda group: bids.create_scan_files(group, destination_path))

[13]:

Create _session.tsv Files

The next step is to create session files for all subjects. As with the scan files, we previously searched the original dataset path for any session information, which will now be used to create the corresponding BIDS session files.

[14]:

session_df = mapping_df[["sub", "ses", "ses_acq_time"]]
session_df = session_df.groupby(["sub"])
session_df.apply(lambda group: bids.create_session_files(group, destination_path))

[14]:

Create and Integrate participants.tsv and participants.json

In this step, we gather available participant information and incorporate it into the BIDS structure.

If you want to use custom participant metadata, you should provide it at the beginning of the code, either as a participants.tsv file or as a CSV/Excel file.

If you provide a participants.tsv file but not a corresponding participants.json, you should fill out the participants.json manually to include descriptions for each field to comply with BIDS standards.
If you provide neither file, new participants.tsv and participants.json files will be automatically created with standard fields:
- species
- age
- sex
- handedness

You can also pass your favourite/custom fields instead of these defaults when creating new files (only applies if no valid TSV is provided).

[15]:

saved_participants = bids.create_participants_files(bids_dir=destination_path,
                                                    participants_tsv_path= participants_tsv_file,
                                                    participants_json_path=participants_json_file,
                                                    mapping_df=mapping_df,
                                                    fields=["gender", "age"])

No valid participants.tsv file found. Creating default files.

Create data description file

To create the dataset_description.json file, we follow these steps:

Search for an existing dataset_description.json in the dataset path and retain the provided information.
If extra_meta_data_path is specified, add the additional metadata about the dataset.
If neither dataset_description.json nor extra metadata is provided, use the basename of the dataset directory as the dataset name and set the BIDS version to ‘1.10.0’.

[16]:

bids.create_data_description(dataset_path, destination_path, extra_meta_data_path)

Check _coordsystem.json file

Since an empty string is not allowed for the NIRSCoordinateSystem key in the *_coordsystem.json file, we will populate it with “Other” to ensure BIDS compliance.

[17]:

bids.check_coord_files(destination_path)

Fix *_events.tsv order

Sorting events files based on onset time

[18]:

_ = mapping_df.apply(bids.sort_events, axis=1, args=(destination_path,))

Edit *_events.tsv

To allow editing of the duration or trial_type columns in the *_events.tsv files, the mapping CSV file must include the following extra columns:

duration: Specifies the new duration for each SNIRF file that needs editing.
cond and cond_match:
- cond: A list of existing condition labels found in the SNIRF file (e.g., [1, 2]).
- cond_match: A list of new labels you want to use in place of those conditions (e.g., [“con”, “inc”]).

These two columns will be combined into a dictionary to update the trial_type column in the events file. This allows for relabeling of condition names in a BIDS-compliant way.

[19]:

_ = mapping_df.apply(bids.edit_events, axis=1, args=(destination_path,))

Creating sourcedata directory

Finally there is this possiblity to keep your original data under sourcedata directory at your destination_path.

[20]:

bids.save_source(dataset_path, destination_path)

Inspecting the results

[21]:

seedir(destination_path)

tmpahpn2f41/
├─dataset_description.json
├─sub-473/
│ ├─sub-473_scans.tsv
│ └─nirs/
│   ├─sub-473_coordsystem.json
│   ├─sub-473_optodes.json
│   ├─sub-473_task-ballsqueezing_nirs.json
│   ├─sub-473_optodes.tsv
│   ├─sub-473_task-ballsqueezing_events.json
│   ├─sub-473_task-ballsqueezing_channels.tsv
│   ├─sub-473_task-ballsqueezing_events.tsv
│   └─sub-473_task-ballsqueezing_nirs.snirf
├─sub-474/
│ ├─sub-474_scans.tsv
│ └─nirs/
│   ├─sub-474_coordsystem.json
│   ├─sub-474_task-ballsqueezing_events.tsv
│   ├─sub-474_task-ballsqueezing_channels.tsv
│   ├─sub-474_optodes.json
│   ├─sub-474_task-ballsqueezing_events.json
│   ├─sub-474_task-ballsqueezing_nirs.json
│   ├─sub-474_task-ballsqueezing_nirs.snirf
│   └─sub-474_optodes.tsv
├─participants.json
├─participants.tsv
└─sourcedata/
  ├─02262024_1100_473/
  │ └─2024-02-26_010/
  │   ├─2024-02-26_014_config.json
  │   ├─2024-02-26_014.snirf
  │   ├─2024-02-26_014_lsl.tri
  │   ├─2024-02-26_014_config.hdr
  │   ├─2024-02-26_014_calibration.json
  │   ├─2024-02-26_014_probeInfo.mat
  │   ├─2024-02-26_014_description.json
  │   ├─2024-02-26_014.wl2
  │   ├─2024-02-26_014.wl1
  │   └─digpts.txt
  ├─snirf2BIDS_mapping.csv
  ├─snirf2BIDS_mapping_edited.csv
  ├─02272024_1030_474/
  │ └─2024-02-27_010/
  │   ├─2024-02-27_010_config.json
  │   ├─2024-02-27_010_config.hdr
  │   ├─2024-02-27_010_calibration.json
  │   ├─2024-02-27_010.wl1
  │   ├─2024-02-27_010_probeInfo.mat
  │   ├─2024-02-27_010.snirf
  │   ├─2024-02-27_010_description.json
  │   ├─2024-02-27_010.wl2
  │   ├─2024-02-27_010_lsl.tri
  │   └─digpts.txt
  └─readme.txt

[22]:

display(pd.read_table(destination_path / "participants.tsv"))

	participant_id	gender	age
0	sub-473	NaN	NaN
1	sub-474	NaN	NaN

[23]:

with open(destination_path / "participants.json") as fin:
    print_json(fin.read())

{
  "gender": null,
  "age": null
}

[24]:

with open(destination_path / "dataset_description.json") as fin:
    print_json(fin.read())

{
  "Name": "snirf2bids_example_dataset",
  "BIDSVersion": "1.10.0",
  "License": "CC0",
  "DatasetType": "raw",
  "Authors": [
    "Enter author names here"
  ],
  "Acknowledgements": "Enter acknowledgements here (e.g., funding sources, institutions).",
  "HowToAcknowledge": "Provide details on how to cite or acknowledge this dataset.",
  "DatasetDOI": "Enter DOI here if available.",
  "Funding": [
    "Enter funding details here, if applicable."
  ],
  "EthicsApprovals": [
    "Enter ethics approval details here, if applicable."
  ],
  "ReferencesAndLinks": [
    "Enter references or related links here, if applicable."
  ]
}