Accelerate collect() by switching to lazy loading by mikekryjak · Pull Request #340 · boutproject/xBOUT

mikekryjak · 2026-03-06T17:54:16Z

This is another change that Claude came up with after #337. It rewrites xbout.load.collect() to use the new lazyload.lazy_open_boutdataset() from #336. It falls back on the original method if things break. It also makes minor improvements to the code, like not relying on the coordinates not being in a specific order.

Test results:

xbout.collect() before change: 24.1s
xbout.collect() after change: 0.12s (!!)
boutdata.collect(): 0.20s

So we are now 200x faster than before, and also 40% faster than boutdata. I suppose the difference could be using Dask.

For completeness, the same dataset takes 2.5s to load in its entirety using the latest lazy loading.

This PR contains #336 and should be merged after.

Only opens one file, using metadata to construct Dask chunks for all other files. This greatly reduces the time needed to open a dataset.

Sorting out imports

If opening a set of NetCDF files that are all in the same directory, use lazy_open_boutdataset. This is a common use-case and is significantly faster this way. For more complicated cases (e.g. concatenating multiple BOUT++ runs), or if `lazy_load = False`, fall back to the old method.

Merge ds.metadata only if it exists.

Testing uses lists of datasets rather than glob string input.

Falls back to the original method if this doesn't work.

bendudson and others added 8 commits February 26, 2026 22:10

Lazy loading of multi-file BOUT++ datasets

31b2998

Only opens one file, using metadata to construct Dask chunks for all other files. This greatly reduces the time needed to open a dataset.

lazyload: Fix ruff checks

21d0d75

Sorting out imports

utils._separate_metadata: Handle missing ds.metadata

1e03443

Merge ds.metadata only if it exists.

open_boutdataset: Handle when datapath is a list

4f648e3

Testing uses lists of datasets rather than glob string input.

open_boutdataset: Fix lazy load

d9e2382

Fix loading of non-relative paths

fdc96cf

Rewrite collect to use lazy loading

6f88d1a

Falls back to the original method if this doesn't work.

mikekryjak added the enhancement New feature or request label Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate collect() by switching to lazy loading#340

Accelerate collect() by switching to lazy loading#340
mikekryjak wants to merge 8 commits intomasterfrom
lazy-load-collect

mikekryjak commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mikekryjak commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants