Skip to content

Container Workflow Needs WorkDir Variable in Precedent #1671

@ax3l

Description

@ax3l

Hi @shuds13,

As discussed today: we got a nice HPC workflow up with optimas/libEnsemble that can make use of containers for the individually executed commands/runs in TemplateEvaluator. Currently, we use podman-hpc but it would work with other HPC-focused container managers, too.

The best-practice for doing many runs inside the same container is to:

  • start (run -d) a container detached
  • exec individual simulations (1-N times)
  • finally stop the container

That way, a persistent container is spun up once for the whole optimas/libEnsemble run is ongoing, and all the fragile and costly resource work like mounting file systems only happens once. The rest is then done with changes of (in-container, thus different base path) work-dirs during exec.

The last challenge we have now: we need to know the current, relative simulation evaluation directory just when an individual run is evaluated, as part of the precedent, to change the container workdir (inside the container) to the cd evaluations/simXYZW/ directory.

Code snippet (from run_grid_scan.py below):

precedent = "podman-hpc exec my_container_name /opt/entrypoint.sh"  # usually from an environment variable in the jop script

# base dir of the optimas/libEnsemble run
base_dir = "/data/"  # this is a mount point inside the container and generally different than the host path

rel_sim_dir = "evaluations/sim0000/" # TODO: generalize to the PWD sim folder that the TemplateEvaluator picks
rel_sim_dir = "%LIBENSEMBLE_SIM_DIR%" # TODO: before calling srun, libensemble would replace `%LIBENSEMBLE_SIM_DIR%` with the sim's run dir

# inject into pre-defined precedent: add `--workdir ...` as needed inside the container
extra_options = f"--workdir {base_dir}/{rel_sim_dir}"
precedent = re.sub(r'(\s+exec)\s+', rf'\1 {extra_options} ', precedent)

ev_main = TemplateEvaluator(
    sim_template="templates/warpx_input_script",
    analysis_func=analysis_func_main,
    executable="templates/warpx",
    precedent=precedent,
    n_gpus=1,  # GPUs per individual evaluation
    env_mpi="srun",
)

Full Example / Private Repo Context

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions