Container Workflow Needs WorkDir Variable in Precedent

Hi @shuds13,

As discussed today: we got a nice HPC workflow up with optimas/libEnsemble that can make use of containers for the individually executed commands/runs in `TemplateEvaluator`. Currently, we use `podman-hpc` but it would work with other HPC-focused container managers, too.

The best-practice for doing many runs inside the same container is to:
- start (`run -d`) a container detached
- `exec` individual simulations (1-N times)
- finally stop the container

That way, a persistent container is spun up once for the whole optimas/libEnsemble run is ongoing, and all the fragile and costly resource work like mounting file systems only happens once. The rest is then done with changes of (in-container, thus different base path) work-dirs during `exec`.

The last challenge we have now: we need to know the current, *relative* simulation evaluation directory just when an individual run is evaluated, as part of the `precedent`, to change the container workdir (inside the container) to the `cd evaluations/simXYZW/` directory.

## Code snippet (from `run_grid_scan.py` below):
```py
precedent = "podman-hpc exec my_container_name /opt/entrypoint.sh"  # usually from an environment variable in the jop script

# base dir of the optimas/libEnsemble run
base_dir = "/data/"  # this is a mount point inside the container and generally different than the host path

rel_sim_dir = "evaluations/sim0000/" # TODO: generalize to the PWD sim folder that the TemplateEvaluator picks
rel_sim_dir = "%LIBENSEMBLE_SIM_DIR%" # TODO: before calling srun, libensemble would replace `%LIBENSEMBLE_SIM_DIR%` with the sim's run dir

# inject into pre-defined precedent: add `--workdir ...` as needed inside the container
extra_options = f"--workdir {base_dir}/{rel_sim_dir}"
precedent = re.sub(r'(\s+exec)\s+', rf'\1 {extra_options} ', precedent)

ev_main = TemplateEvaluator(
    sim_template="templates/warpx_input_script",
    analysis_func=analysis_func_main,
    executable="templates/warpx",
    precedent=precedent,
    n_gpus=1,  # GPUs per individual evaluation
    env_mpi="srun",
)
```

## Full Example / Private Repo Context

- https://github.com/BLAST-AI-ML/synapse-bella-staging-injector/pull/2/files 🔒 
- files: `simulation_scripts/templates/run_grid_scan.py` is the optimas script, `simulation_scripts/submission_script_*` are the "outer" job scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container Workflow Needs WorkDir Variable in Precedent #1671

Code snippet (from `run_grid_scan.py` below):

Full Example / Private Repo Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Container Workflow Needs WorkDir Variable in Precedent #1671

Description

Code snippet (from run_grid_scan.py below):

Full Example / Private Repo Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Code snippet (from `run_grid_scan.py` below):