Skip to content

Accelerate collect() by switching to lazy loading#340

Open
mikekryjak wants to merge 8 commits intomasterfrom
lazy-load-collect
Open

Accelerate collect() by switching to lazy loading#340
mikekryjak wants to merge 8 commits intomasterfrom
lazy-load-collect

Conversation

@mikekryjak
Copy link
Collaborator

This is another change that Claude came up with after #337. It rewrites xbout.load.collect() to use the new lazyload.lazy_open_boutdataset() from #336. It falls back on the original method if things break. It also makes minor improvements to the code, like not relying on the coordinates not being in a specific order.

Test results:

  • xbout.collect() before change: 24.1s
  • xbout.collect() after change: 0.12s (!!)
  • boutdata.collect(): 0.20s

So we are now 200x faster than before, and also 40% faster than boutdata. I suppose the difference could be using Dask.

For completeness, the same dataset takes 2.5s to load in its entirety using the latest lazy loading.

This PR contains #336 and should be merged after.

bendudson and others added 8 commits February 26, 2026 22:10
Only opens one file, using metadata to construct Dask chunks
for all other files. This greatly reduces the time needed
to open a dataset.
Sorting out imports
If opening a set of NetCDF files that are all in the same directory,
use lazy_open_boutdataset. This is a common use-case and is
significantly faster this way.

For more complicated cases (e.g. concatenating multiple BOUT++ runs),
or if `lazy_load = False`, fall back to the old method.
Merge ds.metadata only if it exists.
Testing uses lists of datasets rather than glob string input.
Falls back to the original method if this doesn't work.
@mikekryjak mikekryjak added the enhancement New feature or request label Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants