Convert a fNIRS dataset to BIDS
Do not run all cells at once. Carefully read the comments before each code cell — some steps require you to manually modify certain files (e.g., the mapping CSV) before proceeding. Make sure all required edits are completed before continuing to the next step.
[1]:
# This cells setups the environment when executed in Google Colab.
try:
import google.colab
!curl -s https://raw.githubusercontent.com/ibs-lab/cedalion/dev/scripts/colab_setup.py -o colab_setup.py
# Select branch with --branch "branch name" (default is "dev")
%run colab_setup.py --branch "dev"
except ImportError:
pass
[2]:
import os
import re
import shutil
from pathlib import Path
from tempfile import TemporaryDirectory
import pandas as pd
import snirf2bids as s2b
from seedir import seedir
from rich import print_json
from cedalion.datasets import get_snirf2bids_example_dataset
from cedalion.io import bids
Convert your own dataset or an example
When the constant DEMO_MODE
is set to True
, an example dataset is used. Set it to False
and modify the notebook variables as described below to convert a different dataset.
[3]:
DEMO_MODE = True
Provide file paths and meta data
This notebook shows how to convert an fNIRS dataset into BIDS format. To use it, provide the following inputs:
Dataset Path: Folder containing the raw dataset.
Destination Path: Folder where the BIDS-compliant dataset will be saved.
Mapping CSV File: CSV file that defines the dataset structure and provides necessary details for BIDS conversion.
(Optional) extra_meta_data File: Additional metadata to include in the description.json file. You can use this google form or this website to create this file.
(Optional) participants.tsv / participants.json files. If you already have a participants.tsv/.json file and provide the link below, it will be used directly. Alternatively, if you have participant-level metadata saved in a CSV or Excel file, with the first column as the participant ID and the remaining columns as metadata (with appropriate headers) and you provide the link to it below, the script will convert it into properly formatted .tsv and .json files for BIDS.
[4]:
if DEMO_MODE:
dataset_path, edited_mapping_df_path = get_snirf2bids_example_dataset()
temporary_directory = TemporaryDirectory()
destination_path = Path(temporary_directory.name)
print(f"dataset_path : {dataset_path}\ndestination_path: {destination_path}\n")
seedir(dataset_path)
else:
dataset_path = Path('path-to-your-dataset-folder') # REQUIRED
destination_path = Path('path-to-your-destination-bids-folder') # REQUIRED
Downloading file 'snirf2bids_example_dataset.zip' from 'https://doc.ibs.tu-berlin.de/cedalion/datasets/v25.1.0/snirf2bids_example_dataset.zip' to '/home/runner/.cache/cedalion/v25.1.0'.
Unzipping contents of '/home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip' to '/home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip.unzip'
dataset_path : /home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip.unzip/snirf2bids_example_dataset
destination_path: /tmp/tmpsavc7z4i
snirf2bids_example_dataset/
├─readme.txt
├─snirf2BIDS_mapping_edited.csv
├─02272024_1030_474/
│ └─2024-02-27_010/
│ ├─2024-02-27_010_description.json
│ ├─2024-02-27_010_lsl.tri
│ ├─2024-02-27_010_config.json
│ ├─2024-02-27_010.wl1
│ ├─2024-02-27_010_calibration.json
│ ├─2024-02-27_010.snirf
│ ├─2024-02-27_010_probeInfo.mat
│ ├─2024-02-27_010.wl2
│ ├─2024-02-27_010_config.hdr
│ └─digpts.txt
└─02262024_1100_473/
└─2024-02-26_010/
├─2024-02-26_014.snirf
├─2024-02-26_014.wl2
├─2024-02-26_014.wl1
├─2024-02-26_014_config.hdr
├─2024-02-26_014_probeInfo.mat
├─2024-02-26_014_lsl.tri
├─digpts.txt
├─2024-02-26_014_config.json
├─2024-02-26_014_description.json
└─2024-02-26_014_calibration.json
[5]:
extra_meta_data_path = Path('path-to-your-meta-data') # OPTIONAL
extra_meta_data_path = extra_meta_data_path if extra_meta_data_path.exists() else None
mapping_df_path = bids.get_snirf2bids_mapping_csv(dataset_path)
display(mapping_df_path)
'/home/runner/.cache/cedalion/v25.1.0/snirf2bids_example_dataset.zip.unzip/snirf2bids_example_dataset/snirf2BIDS_mapping.csv'
[6]:
participants_tsv_file = Path('path-to-your-participants.tsv') # OPTIONAL
participants_json_file = Path('path-to-your-participants.json') # OPTIONAL
Please modify the mapping CSV file which is automatically created under you raw dataset folder.
By default, a mapping CSV file is generated under the main raw dataset folder using the get_snirf2bids_mapping_csv function. Before running the rest of the code, open this file, make any necessary edits, and save it. A valid mapping CSV must include all SNIRF files in your dataset, along with the following columns:
sub: Participant identifier
ses (optional): Session identifier
task: Task name or label
run (optional): Run number
acq (optional): Acquisition label
cond (optional): List of condition labels
cond_match (optional): List of matching condition values
duration (optional): Event duration in seconds
[7]:
if DEMO_MODE:
# simulate user edits by replacing mapping_df_path with a prefilled one
shutil.copy(edited_mapping_df_path, mapping_df_path)
mapping_df = pd.read_csv(mapping_df_path, dtype=str)
mapping_df.head(10)
[7]:
current_name | sub | ses | task | run | acq | cond | cond_match | duration | |
---|---|---|---|---|---|---|---|---|---|
0 | 02262024_1100_473/2024-02-26_010/2024-02-26_014 | 473 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN |
1 | 02272024_1030_474/2024-02-27_010/2024-02-27_010 | 474 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN |
The mapping table created below serves as a key component for organizing and processing your dataset. The ses
, run
, and acq
columns are optional and can be set to None if not applicable. The current_name
column contains the path to the SNIRF files in your dataset.
Looking for possible *_scan.tsv files
To ensure no important information (e.g., acquisition time) from the original dataset is lost, we will:
Search Subdirectories: Traverse through all subdirectories within the dataset.
Locate Existing Scan Files: Search for all *_scan.tsv files in the dataset.
Integrate into Mapping Table: Extract the relevant information from these files and add it to our mapping table.
Extracts acquisition time from SNIRF files if missing in the
_scans.tsv
file.
This approach ensures that any details, such as acquisition time, are retained and incorporated into the BIDS-compliant structure.
[8]:
mapping_df["filename_org"] = mapping_df["current_name"].apply(
lambda x: os.path.basename(x))
scan_df = bids.search_for_acq_time_in_scan_files(dataset_path)
mapping_df = pd.merge(mapping_df, scan_df, on="filename_org", how="left")
mapping_df["acq_time"] = mapping_df.apply(
bids.search_for_acq_time_in_snirf_files, axis=1, args=(dataset_path,)
)
mapping_df.head(10)
[8]:
current_name | sub | ses | task | run | acq | cond | cond_match | duration | filename_org | acq_time | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 02262024_1100_473/2024-02-26_010/2024-02-26_014 | 473 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN | 2024-02-26_014 | 2024-02-26 12:09:58 |
1 | 02272024_1030_474/2024-02-27_010/2024-02-27_010 | 474 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN | 2024-02-27_010 | 2024-02-27 11:37:36 |
The acq_time
information is retrieved from the original dataset’s *_scan.tsv files and integrated into the mapping table.
Looking for possible *_session.tsv files
Similar to *_scan.tsv files, we search for *_session.tsv files in the dataset path to capture additional session-level metadata, such as acquisition times. Any relevant information from these files is added to the mapping table to ensure all session details are preserved.
[9]:
session_df = bids.search_for_sessions_acq_time(dataset_path)
mapping_df = pd.merge(mapping_df, session_df, on=["sub", "ses"], how="left")
mapping_df.head(10)
[9]:
current_name | sub | ses | task | run | acq | cond | cond_match | duration | filename_org | acq_time | ses_acq_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 02262024_1100_473/2024-02-26_010/2024-02-26_014 | 473 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN | 2024-02-26_014 | 2024-02-26 12:09:58 | NaN |
1 | 02272024_1030_474/2024-02-27_010/2024-02-27_010 | 474 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN | 2024-02-27_010 | 2024-02-27 11:37:36 | NaN |
Converting the dataset
Create BIDS Folder Structure
The goal of this section is to rename the SNIRF files according to the BIDS naming convention and place them in the appropriate directory under destination_path
, following the BIDS folder structure.
Steps:
Generate new filenames: Create BIDS-compliant filenames for all SNIRF records.
Determine file locations: Identify the appropriate locations for these files within the BIDS folder hierarchy.
This process ensures that the dataset adheres to BIDS standards for organization and naming.
[10]:
mapping_df[["bids_name", "parent_path"]] = mapping_df.apply(
bids.create_bids_standard_filenames, axis=1, result_type='expand')
mapping_df.head(10)
[10]:
current_name | sub | ses | task | run | acq | cond | cond_match | duration | filename_org | acq_time | ses_acq_time | bids_name | parent_path | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 02262024_1100_473/2024-02-26_010/2024-02-26_014 | 473 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN | 2024-02-26_014 | 2024-02-26 12:09:58 | NaN | sub-473_task-ballsqueezing_nirs.snirf | sub-473/nirs |
1 | 02272024_1030_474/2024-02-27_010/2024-02-27_010 | 474 | NaN | ballsqueezing | NaN | NaN | NaN | NaN | NaN | 2024-02-27_010 | 2024-02-27 11:37:36 | NaN | sub-474_task-ballsqueezing_nirs.snirf | sub-474/nirs |
To facilitate proper organization:
parent_path
: Added to the mapping dataframe to define the location of each SNIRF file withindestination_path
.bids_name
: Specifies the new BIDS-compliant name for each file. In the following sections, we will rename all files to their correspondingbids_name
and copy them to their designated parent_path.
[11]:
_ = mapping_df.apply(bids.copy_rename_snirf, axis=1, args=(dataset_path, destination_path))
Create BIDS specific files (e.g., _coordsystem.json)
In this step, we utilize the snirf2bids Python package to generate the necessary .tsv and .json files for the BIDS structure.
For every record, the following files will be created:
_coordsystem.json
_optodes.json
_optodes.tsv
*_channels.tsv
*_events.json
*_events.tsv
*_nirs.json
These files are essential for ensuring the dataset adheres to BIDS standards.
[12]:
s2b.snirf2bids_recurse(destination_path)
pattern = re.compile(r'.*_scans\.tsv$|^participants\.tsv$|^temp_participants\.tsv$')
files_to_delete = [file for file in destination_path.rglob('*') if file.is_file() and pattern.match(file.name)]
for file in files_to_delete:
file.unlink()
Create _scan.tsv Files
Now, we proceed to create scan files for all subjects and sessions. Previously, we searched the original dataset path for any provided scan information, which will now be incorporated into the BIDS structure.
[13]:
scan_df = mapping_df[["sub", "ses", "bids_name", "acq_time"]].copy()
scan_df['ses'].fillna("Unknown", inplace=True)
scan_df = scan_df.groupby(["sub", "ses"])
scan_df.apply(lambda group: bids.create_scan_files(group, destination_path))
[13]:
Create _session.tsv Files
The next step is to create session files for all subjects. As with the scan files, we previously searched the original dataset path for any session information, which will now be used to create the corresponding BIDS session files.
[14]:
session_df = mapping_df[["sub", "ses", "ses_acq_time"]]
session_df = session_df.groupby(["sub"])
session_df.apply(lambda group: bids.create_session_files(group, destination_path))
[14]:
Create and Integrate participants.tsv and participants.json
In this step, we gather available participant information and incorporate it into the BIDS structure.
If you want to use custom participant metadata, you should provide it at the beginning of the code, either as a participants.tsv file or as a CSV/Excel file.
If you provide a participants.tsv file but not a corresponding participants.json, you should fill out the participants.json manually to include descriptions for each field to comply with BIDS standards.
If you provide neither file, new participants.tsv and participants.json files will be automatically created with standard fields:
species
age
sex
handedness
You can also pass your favourite/custom fields instead of these defaults when creating new files (only applies if no valid TSV is provided).
[15]:
saved_participants = bids.create_participants_files(bids_dir=destination_path,
participants_tsv_path= participants_tsv_file,
participants_json_path=participants_json_file,
mapping_df=mapping_df,
fields=["gender", "age"])
No valid participants.tsv file found. Creating default files.
Create data description file
To create the dataset_description.json file, we follow these steps:
Search for an existing dataset_description.json in the dataset path and retain the provided information.
If extra_meta_data_path is specified, add the additional metadata about the dataset.
If neither dataset_description.json nor extra metadata is provided, use the basename of the dataset directory as the dataset name and set the BIDS version to ‘1.10.0’.
[16]:
bids.create_data_description(dataset_path, destination_path, extra_meta_data_path)
Check _coordsystem.json file
Since an empty string is not allowed for the NIRSCoordinateSystem
key in the *_coordsystem.json file, we will populate it with “Other” to ensure BIDS compliance.
[17]:
bids.check_coord_files(destination_path)
Fix *_events.tsv order
Sorting events files based on onset time
[18]:
_ = mapping_df.apply(bids.sort_events, axis=1, args=(destination_path,))
Edit *_events.tsv
To allow editing of the duration
or trial_type
columns in the *_events.tsv files, the mapping CSV file must include the following extra columns:
duration
: Specifies the new duration for each SNIRF file that needs editing.cond and cond_match:
cond: A list of existing condition labels found in the SNIRF file (e.g., [1, 2]).
cond_match: A list of new labels you want to use in place of those conditions (e.g., [“con”, “inc”]).
These two columns will be combined into a dictionary to update the trial_type column in the events file. This allows for relabeling of condition names in a BIDS-compliant way.
[19]:
_ = mapping_df.apply(bids.edit_events, axis=1, args=(destination_path,))
Creating sourcedata directory
Finally there is this possiblity to keep your original data under sourcedata directory at your destination_path
.
[20]:
bids.save_source(dataset_path, destination_path)
Inspecting the results
[21]:
seedir(destination_path)
tmpsavc7z4i/
├─sub-474/
│ ├─nirs/
│ │ ├─sub-474_task-ballsqueezing_nirs.json
│ │ ├─sub-474_coordsystem.json
│ │ ├─sub-474_task-ballsqueezing_events.json
│ │ ├─sub-474_task-ballsqueezing_events.tsv
│ │ ├─sub-474_optodes.json
│ │ ├─sub-474_task-ballsqueezing_nirs.snirf
│ │ ├─sub-474_optodes.tsv
│ │ └─sub-474_task-ballsqueezing_channels.tsv
│ └─sub-474_scans.tsv
├─participants.tsv
├─dataset_description.json
├─sourcedata/
│ ├─readme.txt
│ ├─snirf2BIDS_mapping_edited.csv
│ ├─snirf2BIDS_mapping.csv
│ ├─02272024_1030_474/
│ │ └─2024-02-27_010/
│ │ ├─2024-02-27_010_description.json
│ │ ├─2024-02-27_010_lsl.tri
│ │ ├─2024-02-27_010_config.json
│ │ ├─2024-02-27_010.wl1
│ │ ├─2024-02-27_010_calibration.json
│ │ ├─2024-02-27_010.snirf
│ │ ├─2024-02-27_010_probeInfo.mat
│ │ ├─2024-02-27_010.wl2
│ │ ├─2024-02-27_010_config.hdr
│ │ └─digpts.txt
│ └─02262024_1100_473/
│ └─2024-02-26_010/
│ ├─2024-02-26_014.snirf
│ ├─2024-02-26_014.wl2
│ ├─2024-02-26_014.wl1
│ ├─2024-02-26_014_config.hdr
│ ├─2024-02-26_014_probeInfo.mat
│ ├─2024-02-26_014_lsl.tri
│ ├─digpts.txt
│ ├─2024-02-26_014_config.json
│ ├─2024-02-26_014_description.json
│ └─2024-02-26_014_calibration.json
├─sub-473/
│ ├─nirs/
│ │ ├─sub-473_task-ballsqueezing_nirs.json
│ │ ├─sub-473_task-ballsqueezing_channels.tsv
│ │ ├─sub-473_coordsystem.json
│ │ ├─sub-473_optodes.json
│ │ ├─sub-473_optodes.tsv
│ │ ├─sub-473_task-ballsqueezing_nirs.snirf
│ │ ├─sub-473_task-ballsqueezing_events.json
│ │ └─sub-473_task-ballsqueezing_events.tsv
│ └─sub-473_scans.tsv
└─participants.json
[22]:
display(pd.read_table(destination_path / "participants.tsv"))
participant_id | gender | age | |
---|---|---|---|
0 | sub-473 | NaN | NaN |
1 | sub-474 | NaN | NaN |
[23]:
with open(destination_path / "participants.json") as fin:
print_json(fin.read())
{ "gender": null, "age": null }
[24]:
with open(destination_path / "dataset_description.json") as fin:
print_json(fin.read())
{ "Name": "snirf2bids_example_dataset", "BIDSVersion": "1.10.0", "License": "CC0", "DatasetType": "raw", "Authors": [ "Enter author names here" ], "Acknowledgements": "Enter acknowledgements here (e.g., funding sources, institutions).", "HowToAcknowledge": "Provide details on how to cite or acknowledge this dataset.", "DatasetDOI": "Enter DOI here if available.", "Funding": [ "Enter funding details here, if applicable." ], "EthicsApprovals": [ "Enter ethics approval details here, if applicable." ], "ReferencesAndLinks": [ "Enter references or related links here, if applicable." ] }