Fast loading of multi-file BOUT++ datasets by bendudson · Pull Request #336 · boutproject/xBOUT

bendudson · 2026-02-27T06:24:10Z

Adds a xbout.lazy_open_boutdataset function that reads collections of dmp or restart files by opening one file, then using metadata to construct Dask chunks for all other files without opening them. No merging of datasets needed.
This greatly reduces the time needed to open a dataset.

xbout.open_boutdataset is modified so that if we are opening a collection of NetCDF files that are all in the same directory (a common use-case) then lazy_open_dataset will be used unless disabled with lazy_load=False.

Simple test case from hermes-perftest. 10 files. (t, x, y, z) sizes (1, 16, 50, 1) without X or Y boundaries.

Current approach:

import xbout
%time ds = xbout.open_boutdataset("./BOUT.dmp.*.nc", lazy_load = False)     # Time: 16.4 seconds

New default approach:

import xbout
%time ds = xbout.open_boutdataset("./BOUT.dmp.*.nc")     # Time: 1.6 seconds

Gridfile and geometry loading is handled in the same way as before.

Only opens one file, using metadata to construct Dask chunks for all other files. This greatly reduces the time needed to open a dataset.

Sorting out imports

mikekryjak · 2026-02-27T11:33:43Z

Wow, very cool! I see you made a new function for this. How come we can't just modify the original one?

bendudson · 2026-02-27T15:09:52Z

Wow, very cool! I see you made a new function for this. How come we can't just modify the original one?

The next step is to modify the original open_boutdataset to call this function, then perform all the geometry stuff. I wanted to get this working first because open_boutdataset handles many different cases e.g pre-squashed files.

If opening a set of NetCDF files that are all in the same directory, use lazy_open_boutdataset. This is a common use-case and is significantly faster this way. For more complicated cases (e.g. concatenating multiple BOUT++ runs), or if `lazy_load = False`, fall back to the old method.

Merge ds.metadata only if it exists.

Testing uses lists of datasets rather than glob string input.

mikekryjak · 2026-03-06T11:21:44Z

I load with absolute paths most of the time, and the glob you used only works for relative paths. I just pushed a fix.

bendudson added 2 commits February 26, 2026 22:10

Lazy loading of multi-file BOUT++ datasets

31b2998

Only opens one file, using metadata to construct Dask chunks for all other files. This greatly reduces the time needed to open a dataset.

lazyload: Fix ruff checks

21d0d75

Sorting out imports

bendudson changed the title ~~WIP: Fast loading of multi-file BOUT++ datasets~~ Fast loading of multi-file BOUT++ datasets Mar 4, 2026

bendudson added 3 commits March 4, 2026 15:25

utils._separate_metadata: Handle missing ds.metadata

1e03443

Merge ds.metadata only if it exists.

open_boutdataset: Handle when datapath is a list

4f648e3

Testing uses lists of datasets rather than glob string input.

open_boutdataset: Fix lazy load

d9e2382

bendudson requested review from ZedThree and johnomotani March 5, 2026 00:50

Fix loading of non-relative paths

fdc96cf

mikekryjak mentioned this pull request Mar 6, 2026

Accelerate collect() by switching to lazy loading #340

Open

mikekryjak mentioned this pull request Mar 17, 2026

Fix CI: add sudo apt-get update to actions #341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast loading of multi-file BOUT++ datasets#336

Fast loading of multi-file BOUT++ datasets#336
bendudson wants to merge 7 commits intomasterfrom
feature/lazy-load

bendudson commented Feb 27, 2026 •

edited

Loading

Uh oh!

mikekryjak commented Feb 27, 2026

Uh oh!

bendudson commented Feb 27, 2026

Uh oh!

mikekryjak commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bendudson commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikekryjak commented Feb 27, 2026

Uh oh!

bendudson commented Feb 27, 2026

Uh oh!

mikekryjak commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bendudson commented Feb 27, 2026 •

edited

Loading