Skip to content

Use ROS3 HDF5 driver or fsspec with local sparse cache for more efficient access. #307

@yarikoptic

Description

@yarikoptic

Relates to #264 as possibly avoidable via complete avoidance of fetching an .nwb file in full twice. Also might be of interest in the scope of the https://github.com/OpenSourceBrain/DANDIArchiveShowcase @anhknguyen96 is working on.

https://pynwb.readthedocs.io/en/stable/tutorials/advanced_io/streaming.html gives an example of how to use ros3 HDF5 driver to access remote file on S3 bucket (e.g. dandiarchive) without downloading it in full.

Another approach is HDF5 agnostic, using some fsspec but it would require pynwb to be able to open from an existing file handle which I am not sure if possible -- filed NeurodataWithoutBorders/pynwb#1525 . (well -- alternative is a fuse file system like the one provided by https://github.com/datalad/datalad-fuse/ for that file -- but might be too ad-hoc/heavy although quite possible via FUSE'ing an entire bucket whenever request comes in, and using local cache with some garbage-collection routines to prune it down once in a while).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions