Data Structures and I/O
[1]:
import numpy as np
import pandas as pd
import xarray as xr
import tempfile
from pathlib import Path
import cedalion
import cedalion.io
import cedalion.datasets
import cedalion.nirs
import cedalion.xrutils as xrutils
pd.set_option('display.max_rows', 10)
xr.set_options(display_expand_data=False);
[2]:
# helper function
def calc_concentratoin(rec):
od = cedalion.nirs.int2od(rec["amp"])
dpf = xr.DataArray([6, 6], dims="wavelength", coords={"wavelength" : od.wavelength})
return cedalion.nirs.od2conc(od, rec.geo3d, dpf)
Reading Snirf Files
Snirf files can be loaded with the cedalion.io.read_snirf
method. This returns a list of cedalion.dataclasses.Recording
objects. The
[3]:
path_to_snirf_file = cedalion.datasets.get_fingertapping_snirf_path()
recordings = cedalion.io.read_snirf(path_to_snirf_file)
display(path_to_snirf_file)
display(recordings)
display(len(recordings))
PosixPath('/home/runner/.cache/cedalion/fingertapping.zip.unzip/fingertapping/sub-01_task-tapping_nirs.snirf')
[<Recording | timeseries: ['amp'], masks: [], stim: ['1.0', '15.0', '2.0', '3.0'], aux_ts: [], aux_obj: []>]
1
Accessing example datasets
Example datasets are accessible through functions in cedalion.datasets
. These take care of downloading, caching and updating the data files. Often they also already load the data.
[4]:
rec = cedalion.datasets.get_fingertapping()
display(rec)
<Recording | timeseries: ['amp'], masks: [], stim: ['1.0', '15.0', '2.0', '3.0'], aux_ts: [], aux_obj: []>
Recording containers
The class cedalion.dataclasses.Recording
is Cedalion’s main data container to carry related data objects through the program. It can store time series, masks, auxiliary timeseries, probe, headmodel and stimulus information as well as meta data about the recording. It has the following properties:
field |
description |
---|---|
timeseries |
a dictionary of timeseries objects |
masks |
a dictionary of masks that flag time points as good or bad |
geo3d |
3D probe geometry |
geo2d |
2D probe geometry |
stim |
dataframe with stimulus information |
aux_tx |
dictionary of auxiliary time series objects |
aux_tx |
dictionary for any other auxiliary objects |
head_model |
voxel image, cortex and scalp surfaces |
meta_data |
dictionary for meta data |
container is very similar to the layout of a snirf file
Recording
maps mainly to nirs groupstimeseries objects map to data elements
Dictionaries in Recording
dictionaries are key value stores
maintain order in which values are added -> facilitate workflows
the user differentiates time series by name.
names are free to choose but there are a few canonical names used by
read_snirf
and expected bywrite_snirf
:
data type |
canonical name |
---|---|
unprocessed raw |
“amp” |
processed raw |
“amp” |
processed dOD |
“od” |
processed concentrations |
“conc” |
processed central moments” |
“moments” |
processed blood flow inddata_structures_oldex |
“bfi” |
processed HRF dOD |
“hrf_od” |
processed HRF central moments |
“hrf_moments” |
processed HRF concentrations” |
“hrf_conc” |
processed HRF blood flow index |
“hrf_bfi” |
processed absorption coefficient |
“mua” |
processed scattering coefficient |
“musp” |
Inspecting a Recording container
[5]:
display(rec.timeseries.keys())
display(type(rec.timeseries["amp"]))
odict_keys(['amp'])
xarray.core.dataarray.DataArray
[6]:
rec.meta_data
[6]:
OrderedDict([('SubjectID', 'P1'),
('MeasurementDate', '2020-01-01'),
('MeasurementTime', '13:16:16Z'),
('LengthUnit', 'm'),
('TimeUnit', 's'),
('FrequencyUnit', 'Hz'),
('DateOfBirth', '1986-01-01'),
('MNE_coordFrame', 4),
('sex', '1')])
Shortcut for accessing time series:
[7]:
rec["amp"] is rec.timeseries["amp"]
[7]:
True
Time Series
mulitvariate time series are stored in
xarray.DataArrays
if it has dimensions ‘channel’ and ‘time’ we call it a
NDTimeSeries
named dimensions
coordinates
physical units
[8]:
rec["amp"]
[8]:
<xarray.DataArray (channel: 28, wavelength: 2, time: 23239)> Size: 10MB [V] 0.09137 0.09099 0.09102 0.09044 0.09094 ... 1.277 1.273 1.273 1.273 1.276 Coordinates: * time (time) float64 186kB 0.0 0.128 0.256 ... 2.974e+03 2.974e+03 samples (time) int64 186kB 0 1 2 3 4 5 ... 23234 23235 23236 23237 23238 * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16' * wavelength (wavelength) float64 16B 760.0 850.0 Attributes: data_type_group: unprocessed raw
[9]:
rec["conc"] = calc_concentratoin(rec)
display(rec["conc"])
<xarray.DataArray 'concentration' (chromo: 2, channel: 28, time: 23239)> Size: 10MB [µM] 0.1336 0.00383 0.3486 0.1915 0.4156 ... -1.037 -1.004 -1.073 -1.067 -1.076 Coordinates: * chromo (chromo) <U3 24B 'HbO' 'HbR' * time (time) float64 186kB 0.0 0.128 0.256 ... 2.974e+03 2.974e+03 samples (time) int64 186kB 0 1 2 3 4 5 ... 23234 23235 23236 23237 23238 * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S7' 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16'
Probe Geometry - geo3D
labeled points stored in 2D array
if it has a ‘label’ dimension and ‘label’ and ‘type’ coordinates we call it a
LabeledPointCloud
[10]:
rec.geo3d
[10]:
<xarray.DataArray (label: 55, digitized: 3)> Size: 1kB [m] -0.04161 0.0268 0.1299 -0.06477 0.05814 ... 0.06258 0.08226 0.01751 0.06619 Coordinates: type (label) object 440B PointType.SOURCE ... PointType.LANDMARK * label (label) <U3 660B 'S1' 'S2' 'S3' 'S4' 'S5' ... '25' '26' '27' '28' Dimensions without coordinates: digitized
Xarray functionality
Specify axis by name:
[11]:
amp = rec["amp"]
amp.mean("time")
[11]:
<xarray.DataArray (channel: 28, wavelength: 2)> Size: 448B [V] 0.09514 0.1902 0.2256 0.6123 0.1177 ... 0.485 0.5704 0.9371 0.6681 1.259 Coordinates: * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16' * wavelength (wavelength) float64 16B 760.0 850.0
get the second channel formed by S1 and D2:
[12]:
amp[1, :, :] # location-based indexing
amp.loc["S1D2", :, :] # label-based indexing
amp.sel(channel="S1D2") # label-based indexing
[12]:
<xarray.DataArray (wavelength: 2, time: 23239)> Size: 372kB [V] 0.2275 0.2297 0.2261 0.2259 0.2267 ... 0.6126 0.6136 0.6072 0.6087 0.6091 Coordinates: * time (time) float64 186kB 0.0 0.128 0.256 ... 2.974e+03 2.974e+03 samples (time) int64 186kB 0 1 2 3 4 5 ... 23234 23235 23236 23237 23238 channel <U4 16B 'S1D2' source object 8B 'S1' detector object 8B 'D2' * wavelength (wavelength) float64 16B 760.0 850.0 Attributes: data_type_group: unprocessed raw
Joins between two arrays:
[13]:
rec.geo3d.loc[amp.source]
[13]:
<xarray.DataArray (channel: 28, digitized: 3)> Size: 672B [m] -0.04161 0.0268 0.1299 -0.04161 0.0268 ... 0.06571 0.08189 0.02043 0.06571 Coordinates: type (channel) object 224B PointType.SOURCE ... PointType.SOURCE label (channel) <U3 336B 'S1' 'S1' 'S1' 'S1' ... 'S7' 'S8' 'S8' 'S8' * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S7' 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16' Dimensions without coordinates: digitized
[14]:
distances = xrutils.norm(rec.geo3d.loc[amp.source] - rec.geo3d.loc[amp.detector], "digitized")
display(distances)
<xarray.DataArray (channel: 28)> Size: 224B [m] 0.03929 0.03891 0.04089 0.00826 0.03718 ... 0.00732 0.04067 0.03344 0.007534 Coordinates: * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S7' 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16'
Physical units:
[15]:
rec.masks["distance_mask"] = distances > 1.5 * cedalion.units.cm
display(rec.masks["distance_mask"])
<xarray.DataArray (channel: 28)> Size: 28B True True True False True True True ... False True True False True True False Coordinates: * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S7' 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16'
Additional functionality through accessors:
[16]:
distances.pint.to("mm")
[16]:
<xarray.DataArray (channel: 28)> Size: 224B [mm] 39.29 38.91 40.89 8.26 37.18 37.6 ... 40.43 37.22 7.32 40.67 33.44 7.534 Coordinates: * channel (channel) object 224B 'S1D1' 'S1D2' 'S1D3' ... 'S8D8' 'S8D16' source (channel) object 224B 'S1' 'S1' 'S1' 'S1' ... 'S7' 'S8' 'S8' 'S8' detector (channel) object 224B 'D1' 'D2' 'D3' 'D9' ... 'D7' 'D8' 'D16'
Writing snirf files
pass
Recording
object tocedalion.io.write_snirf
caveat: many
Recording
fields have correspondants in snirf files, but not all.
[17]:
with tempfile.TemporaryDirectory() as tmpdir:
output_path = Path(tmpdir).joinpath("test.snirf")
cedalion.io.write_snirf(output_path, rec)