Convert a fNIRS dataset to BIDS

Do not run all cells at once. Carefully read the comments before each code cell — some steps require you to manually modify certain files (e.g., the mapping CSV) before proceeding. Make sure all required edits are completed before continuing to the next step.

[1]:
# This cells setups the environment when executed in Google Colab.
try:
    import google.colab
    !curl -s https://raw.githubusercontent.com/ibs-lab/cedalion/dev/scripts/colab_setup.py -o colab_setup.py
    # Select branch with --branch "branch name" (default is "dev")
    %run colab_setup.py --branch "dev"
except ImportError:
    pass
[2]:
import os
import re
import shutil
from pathlib import Path
from tempfile import TemporaryDirectory

import pandas as pd
import snirf2bids as s2b
from seedir import seedir
from rich import print_json

from cedalion.datasets import get_snirf2bids_example_dataset
from cedalion.io import bids

Convert your own dataset or an example

When the constant DEMO_MODE is set to True, an example dataset is used. Set it to False and modify the notebook variables as described below to convert a different dataset.

[3]:
DEMO_MODE = True

Provide file paths and meta data

This notebook shows how to convert an fNIRS dataset into BIDS format. To use it, provide the following inputs:

  1. Dataset Path: Folder containing the raw dataset.

  2. Destination Path: Folder where the BIDS-compliant dataset will be saved.

  3. Mapping CSV File: CSV file that defines the dataset structure and provides necessary details for BIDS conversion.

  4. (Optional) extra_meta_data File: Additional metadata to include in the description.json file. You can use this google form or this website to create this file.

  5. (Optional) participants.tsv / participants.json files. If you already have a participants.tsv/.json file and provide the link below, it will be used directly. Alternatively, if you have participant-level metadata saved in a CSV or Excel file, with the first column as the participant ID and the remaining columns as metadata (with appropriate headers) and you provide the link to it below, the script will convert it into properly formatted .tsv and .json files for BIDS.

[4]:
if DEMO_MODE:
    dataset_path, edited_mapping_df_path = get_snirf2bids_example_dataset()

    temporary_directory = TemporaryDirectory()
    destination_path = Path(temporary_directory.name)

    print(f"dataset_path    : {dataset_path}\ndestination_path: {destination_path}\n")
    seedir(dataset_path)

else:
    dataset_path = Path('path-to-your-dataset-folder')  # REQUIRED
    destination_path = Path('path-to-your-destination-bids-folder') # REQUIRED
Downloading file 'snirf2bids_example_dataset.zip' from 'https://doc.ibs.tu-berlin.de/cedalion/datasets/v25.1.0/snirf2bids_example_dataset.zip' to '/home/runner/.cache/cedalion/v25.1.0'.
Unzipping contents of '/home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip' to '/home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip.unzip'
dataset_path    : /home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip.unzip/snirf2bids_example_dataset
destination_path: /tmp/tmpsavc7z4i

snirf2bids_example_dataset/
├─readme.txt
├─snirf2BIDS_mapping_edited.csv
├─02272024_1030_474/
│ └─2024-02-27_010/
│   ├─2024-02-27_010_description.json
│   ├─2024-02-27_010_lsl.tri
│   ├─2024-02-27_010_config.json
│   ├─2024-02-27_010.wl1
│   ├─2024-02-27_010_calibration.json
│   ├─2024-02-27_010.snirf
│   ├─2024-02-27_010_probeInfo.mat
│   ├─2024-02-27_010.wl2
│   ├─2024-02-27_010_config.hdr
│   └─digpts.txt
└─02262024_1100_473/
  └─2024-02-26_010/
    ├─2024-02-26_014.snirf
    ├─2024-02-26_014.wl2
    ├─2024-02-26_014.wl1
    ├─2024-02-26_014_config.hdr
    ├─2024-02-26_014_probeInfo.mat
    ├─2024-02-26_014_lsl.tri
    ├─digpts.txt
    ├─2024-02-26_014_config.json
    ├─2024-02-26_014_description.json
    └─2024-02-26_014_calibration.json
[5]:
extra_meta_data_path = Path('path-to-your-meta-data')  # OPTIONAL
extra_meta_data_path = extra_meta_data_path if extra_meta_data_path.exists() else None

mapping_df_path = bids.get_snirf2bids_mapping_csv(dataset_path)
display(mapping_df_path)
'/home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip.unzip/snirf2bids_example_dataset/snirf2BIDS_mapping.csv'
[6]:
participants_tsv_file = Path('path-to-your-participants.tsv') # OPTIONAL
participants_json_file = Path('path-to-your-participants.json') # OPTIONAL

Please modify the mapping CSV file which is automatically created under you raw dataset folder.

By default, a mapping CSV file is generated under the main raw dataset folder using the get_snirf2bids_mapping_csv function. Before running the rest of the code, open this file, make any necessary edits, and save it. A valid mapping CSV must include all SNIRF files in your dataset, along with the following columns:

  • sub: Participant identifier

  • ses (optional): Session identifier

  • task: Task name or label

  • run (optional): Run number

  • acq (optional): Acquisition label

  • cond (optional): List of condition labels

  • cond_match (optional): List of matching condition values

  • duration (optional): Event duration in seconds

[7]:
if DEMO_MODE:
    # simulate user edits by replacing mapping_df_path with a prefilled one
    shutil.copy(edited_mapping_df_path, mapping_df_path)

mapping_df = pd.read_csv(mapping_df_path, dtype=str)
mapping_df.head(10)
[7]:
current_name sub ses task run acq cond cond_match duration
0 02262024_1100_473/2024-02-26_010/2024-02-26_014 473 NaN ballsqueezing NaN NaN NaN NaN NaN
1 02272024_1030_474/2024-02-27_010/2024-02-27_010 474 NaN ballsqueezing NaN NaN NaN NaN NaN

The mapping table created below serves as a key component for organizing and processing your dataset. The ses, run, and acq columns are optional and can be set to None if not applicable. The current_name column contains the path to the SNIRF files in your dataset.

Looking for possible *_scan.tsv files

To ensure no important information (e.g., acquisition time) from the original dataset is lost, we will:

  • Search Subdirectories: Traverse through all subdirectories within the dataset.

  • Locate Existing Scan Files: Search for all *_scan.tsv files in the dataset.

  • Integrate into Mapping Table: Extract the relevant information from these files and add it to our mapping table.

  • Extracts acquisition time from SNIRF files if missing in the _scans.tsv file.

This approach ensures that any details, such as acquisition time, are retained and incorporated into the BIDS-compliant structure.

[8]:
mapping_df["filename_org"] = mapping_df["current_name"].apply(
    lambda x: os.path.basename(x))
scan_df = bids.search_for_acq_time_in_scan_files(dataset_path)

mapping_df = pd.merge(mapping_df, scan_df, on="filename_org", how="left")
mapping_df["acq_time"] = mapping_df.apply(
    bids.search_for_acq_time_in_snirf_files, axis=1, args=(dataset_path,)
)

mapping_df.head(10)
[8]:
current_name sub ses task run acq cond cond_match duration filename_org acq_time
0 02262024_1100_473/2024-02-26_010/2024-02-26_014 473 NaN ballsqueezing NaN NaN NaN NaN NaN 2024-02-26_014 2024-02-26 12:09:58
1 02272024_1030_474/2024-02-27_010/2024-02-27_010 474 NaN ballsqueezing NaN NaN NaN NaN NaN 2024-02-27_010 2024-02-27 11:37:36

The acq_time information is retrieved from the original dataset’s *_scan.tsv files and integrated into the mapping table.

Looking for possible *_session.tsv files

Similar to *_scan.tsv files, we search for *_session.tsv files in the dataset path to capture additional session-level metadata, such as acquisition times. Any relevant information from these files is added to the mapping table to ensure all session details are preserved.

[9]:
session_df = bids.search_for_sessions_acq_time(dataset_path)
mapping_df = pd.merge(mapping_df, session_df, on=["sub", "ses"], how="left")

mapping_df.head(10)
[9]:
current_name sub ses task run acq cond cond_match duration filename_org acq_time ses_acq_time
0 02262024_1100_473/2024-02-26_010/2024-02-26_014 473 NaN ballsqueezing NaN NaN NaN NaN NaN 2024-02-26_014 2024-02-26 12:09:58 NaN
1 02272024_1030_474/2024-02-27_010/2024-02-27_010 474 NaN ballsqueezing NaN NaN NaN NaN NaN 2024-02-27_010 2024-02-27 11:37:36 NaN

Converting the dataset

Create BIDS Folder Structure

The goal of this section is to rename the SNIRF files according to the BIDS naming convention and place them in the appropriate directory under destination_path, following the BIDS folder structure.

Steps:

  1. Generate new filenames: Create BIDS-compliant filenames for all SNIRF records.

  2. Determine file locations: Identify the appropriate locations for these files within the BIDS folder hierarchy.

This process ensures that the dataset adheres to BIDS standards for organization and naming.

[10]:
mapping_df[["bids_name", "parent_path"]] = mapping_df.apply(
    bids.create_bids_standard_filenames, axis=1, result_type='expand')

mapping_df.head(10)
[10]:
current_name sub ses task run acq cond cond_match duration filename_org acq_time ses_acq_time bids_name parent_path
0 02262024_1100_473/2024-02-26_010/2024-02-26_014 473 NaN ballsqueezing NaN NaN NaN NaN NaN 2024-02-26_014 2024-02-26 12:09:58 NaN sub-473_task-ballsqueezing_nirs.snirf sub-473/nirs
1 02272024_1030_474/2024-02-27_010/2024-02-27_010 474 NaN ballsqueezing NaN NaN NaN NaN NaN 2024-02-27_010 2024-02-27 11:37:36 NaN sub-474_task-ballsqueezing_nirs.snirf sub-474/nirs

To facilitate proper organization:

  • parent_path: Added to the mapping dataframe to define the location of each SNIRF file within destination_path.

  • bids_name: Specifies the new BIDS-compliant name for each file. In the following sections, we will rename all files to their corresponding bids_name and copy them to their designated parent_path.

[11]:
_ = mapping_df.apply(bids.copy_rename_snirf, axis=1, args=(dataset_path, destination_path))

Create BIDS specific files (e.g., _coordsystem.json)

In this step, we utilize the snirf2bids Python package to generate the necessary .tsv and .json files for the BIDS structure.

For every record, the following files will be created:

  1. _coordsystem.json

  2. _optodes.json

  3. _optodes.tsv

  4. *_channels.tsv

  5. *_events.json

  6. *_events.tsv

  7. *_nirs.json

These files are essential for ensuring the dataset adheres to BIDS standards.

[12]:
s2b.snirf2bids_recurse(destination_path)
pattern = re.compile(r'.*_scans\.tsv$|^participants\.tsv$|^temp_participants\.tsv$')
files_to_delete = [file for file in destination_path.rglob('*') if file.is_file() and pattern.match(file.name)]
for file in files_to_delete:
    file.unlink()

Create _scan.tsv Files

Now, we proceed to create scan files for all subjects and sessions. Previously, we searched the original dataset path for any provided scan information, which will now be incorporated into the BIDS structure.

[13]:
scan_df = mapping_df[["sub", "ses", "bids_name", "acq_time"]].copy()
scan_df['ses'].fillna("Unknown", inplace=True)
scan_df = scan_df.groupby(["sub", "ses"])
scan_df.apply(lambda group: bids.create_scan_files(group, destination_path))
[13]:

Create _session.tsv Files

The next step is to create session files for all subjects. As with the scan files, we previously searched the original dataset path for any session information, which will now be used to create the corresponding BIDS session files.

[14]:
session_df = mapping_df[["sub", "ses", "ses_acq_time"]]
session_df = session_df.groupby(["sub"])
session_df.apply(lambda group: bids.create_session_files(group, destination_path))
[14]:

Create and Integrate participants.tsv and participants.json

In this step, we gather available participant information and incorporate it into the BIDS structure.

If you want to use custom participant metadata, you should provide it at the beginning of the code, either as a participants.tsv file or as a CSV/Excel file.

  • If you provide a participants.tsv file but not a corresponding participants.json, you should fill out the participants.json manually to include descriptions for each field to comply with BIDS standards.

  • If you provide neither file, new participants.tsv and participants.json files will be automatically created with standard fields:

    • species

    • age

    • sex

    • handedness

You can also pass your favourite/custom fields instead of these defaults when creating new files (only applies if no valid TSV is provided).

[15]:
saved_participants = bids.create_participants_files(bids_dir=destination_path,
                                                    participants_tsv_path= participants_tsv_file,
                                                    participants_json_path=participants_json_file,
                                                    mapping_df=mapping_df,
                                                    fields=["gender", "age"])
No valid participants.tsv file found. Creating default files.

Create data description file

To create the dataset_description.json file, we follow these steps:

  1. Search for an existing dataset_description.json in the dataset path and retain the provided information.

  2. If extra_meta_data_path is specified, add the additional metadata about the dataset.

  3. If neither dataset_description.json nor extra metadata is provided, use the basename of the dataset directory as the dataset name and set the BIDS version to ‘1.10.0’.

[16]:
bids.create_data_description(dataset_path, destination_path, extra_meta_data_path)

Check _coordsystem.json file

Since an empty string is not allowed for the NIRSCoordinateSystem key in the *_coordsystem.json file, we will populate it with “Other” to ensure BIDS compliance.

[17]:
bids.check_coord_files(destination_path)

Fix *_events.tsv order

Sorting events files based on onset time

[18]:
_ = mapping_df.apply(bids.sort_events, axis=1, args=(destination_path,))

Edit *_events.tsv

To allow editing of the duration or trial_type columns in the *_events.tsv files, the mapping CSV file must include the following extra columns:

  1. duration: Specifies the new duration for each SNIRF file that needs editing.

  2. cond and cond_match:

    • cond: A list of existing condition labels found in the SNIRF file (e.g., [1, 2]).

    • cond_match: A list of new labels you want to use in place of those conditions (e.g., [“con”, “inc”]).

These two columns will be combined into a dictionary to update the trial_type column in the events file. This allows for relabeling of condition names in a BIDS-compliant way.

[19]:
_ = mapping_df.apply(bids.edit_events, axis=1, args=(destination_path,))

Creating sourcedata directory

Finally there is this possiblity to keep your original data under sourcedata directory at your destination_path.

[20]:
bids.save_source(dataset_path, destination_path)

Inspecting the results

[21]:
seedir(destination_path)
tmpsavc7z4i/
├─sub-474/
│ ├─nirs/
│ │ ├─sub-474_task-ballsqueezing_nirs.json
│ │ ├─sub-474_coordsystem.json
│ │ ├─sub-474_task-ballsqueezing_events.json
│ │ ├─sub-474_task-ballsqueezing_events.tsv
│ │ ├─sub-474_optodes.json
│ │ ├─sub-474_task-ballsqueezing_nirs.snirf
│ │ ├─sub-474_optodes.tsv
│ │ └─sub-474_task-ballsqueezing_channels.tsv
│ └─sub-474_scans.tsv
├─participants.tsv
├─dataset_description.json
├─sourcedata/
│ ├─readme.txt
│ ├─snirf2BIDS_mapping_edited.csv
│ ├─snirf2BIDS_mapping.csv
│ ├─02272024_1030_474/
│ │ └─2024-02-27_010/
│ │   ├─2024-02-27_010_description.json
│ │   ├─2024-02-27_010_lsl.tri
│ │   ├─2024-02-27_010_config.json
│ │   ├─2024-02-27_010.wl1
│ │   ├─2024-02-27_010_calibration.json
│ │   ├─2024-02-27_010.snirf
│ │   ├─2024-02-27_010_probeInfo.mat
│ │   ├─2024-02-27_010.wl2
│ │   ├─2024-02-27_010_config.hdr
│ │   └─digpts.txt
│ └─02262024_1100_473/
│   └─2024-02-26_010/
│     ├─2024-02-26_014.snirf
│     ├─2024-02-26_014.wl2
│     ├─2024-02-26_014.wl1
│     ├─2024-02-26_014_config.hdr
│     ├─2024-02-26_014_probeInfo.mat
│     ├─2024-02-26_014_lsl.tri
│     ├─digpts.txt
│     ├─2024-02-26_014_config.json
│     ├─2024-02-26_014_description.json
│     └─2024-02-26_014_calibration.json
├─sub-473/
│ ├─nirs/
│ │ ├─sub-473_task-ballsqueezing_nirs.json
│ │ ├─sub-473_task-ballsqueezing_channels.tsv
│ │ ├─sub-473_coordsystem.json
│ │ ├─sub-473_optodes.json
│ │ ├─sub-473_optodes.tsv
│ │ ├─sub-473_task-ballsqueezing_nirs.snirf
│ │ ├─sub-473_task-ballsqueezing_events.json
│ │ └─sub-473_task-ballsqueezing_events.tsv
│ └─sub-473_scans.tsv
└─participants.json
[22]:
display(pd.read_table(destination_path / "participants.tsv"))
participant_id gender age
0 sub-473 NaN NaN
1 sub-474 NaN NaN
[23]:
with open(destination_path / "participants.json") as fin:
    print_json(fin.read())
{
  "gender": null,
  "age": null
}
[24]:

with open(destination_path / "dataset_description.json") as fin: print_json(fin.read())
{
  "Name": "snirf2bids_example_dataset",
  "BIDSVersion": "1.10.0",
  "License": "CC0",
  "DatasetType": "raw",
  "Authors": [
    "Enter author names here"
  ],
  "Acknowledgements": "Enter acknowledgements here (e.g., funding sources, institutions).",
  "HowToAcknowledge": "Provide details on how to cite or acknowledge this dataset.",
  "DatasetDOI": "Enter DOI here if available.",
  "Funding": [
    "Enter funding details here, if applicable."
  ],
  "EthicsApprovals": [
    "Enter ethics approval details here, if applicable."
  ],
  "ReferencesAndLinks": [
    "Enter references or related links here, if applicable."
  ]
}