Skip to content

Tool to run CABLE configurations without payu #698

@Whyborn

Description

@Whyborn

Why

We would like a light-weight runner that runs CABLE configurations designed for payu. We want a tool other than payu for 2 reasons:

  1. payu only operates on Gadi, and that's unlikely to change anytime soon. We want to be able to run on any machine with a standard python installation.
  2. payu is fairly heavy. We want to make something very light that people can fire off without thinking.

What

What should the tool actually be able to do?

  1. Run a configuration in the same way as payu would (at the surface level- in that it reads the config.yaml, creates the same input and output directories) e.g. with run-config from the configuration directory.
  2. Add metadata about the configuration being run to the output files of the run. The metadata should consist of:
  • The configuration repository
  • The configuration commit hash (with a -dirty for a modified config?)
  • The date of the run
  • Brief record of the inputs used

What it doesn't need to do:

  1. Run multi-stage configurations- each configuration consists of a single call to cable.
  2. Proper provenance tracking of inputs, past recording of file names. While this is a nice to have, I don't think is feasible while still being lightweight and portable. Minimal proper provenance would require checksum calculations, which can be a significant task for inputs spread over many files as atmosphere forcing often is. Some machines may have official catalogues with DOIs and such, but relying this is certainly not portable.

On recording the commit hash

Anyone wanting to run in a different location will inevitably have to modify the configuration, to point to the correct inputs on that machine. This will lead to a "dirty" repository. Is this desirable? Do we need to provide information about the diff? If we don't, then it's not possible to determine whether a dirty represents a change in the input paths, or a change in the science configuration. I would be in favour of preventing the config.yaml modifying the namelist altogether, and only configure the input locations and job submission. Then, if the repository is "dirty", we can provide a diff excluding the config.yaml? Or perhaps exclude the config.yaml from the "dirty" check?

My choice would be to ignore config.yaml changes in the "dirty" check, assuming we exclude namelist changes from the config.yaml. This way the "dirty" strictly relates to science choices.

On tracking inputs

As stated above, doing checksums on inputs I think is outside the remit of this tool, as it can easily get expensive. What we can do very cheaply is provide "date last modified" information about each of the inputs. This way, if configuration results change unexpectedly, we can quickly look at the "date last modified" for the inputs and determine whether this was a source of change.

Summary

The tool will allow someone to call config-run from a payu configuration directory, to run the configuration like payu run would. The tool will run on any machine that has a standard installation of python. The model outputs would be the same as it would be with payu, with the additional metadata applied to each of the outputs:

  • `"configuration_repo": <repo_URL>"
  • "configuration_hash": <commit_hash with optional "-dirty">
  • "date_created": <ISO 8601 format date>
  • "inputs": "<path_to_input> <date_last_modified ISO8601 format>\n
    <path_to_input> <date_last_modified ISO8601 format>\n"

@gabsun @SeanBryan51 How does this sound? Does this meet the specifications we outlined in our meeting? My only remaining question is how to distribute this- should it be a standalone package installable via pip or conda? Should it ship with CABLE directly?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions