Skip to content

MegatronBridge: hf_token validator is too strict — rejects non-gated models and breaks CI #844

@rutayan-nv

Description

@rutayan-nv

Bug Description

MegatronBridgeCmdArgs.validate_hf_token raises a ValidationError when hf_token is empty and HF_TOKEN is not set in the environment, even for models that do not require a HuggingFace token (e.g., Qwen3-235B-A22B, which has gated: False on HuggingFace).

This breaks test_test_definitions.py in CI environments where HF_TOKEN is not set.

Root Cause

The validator in megatron_bridge.py:

@field_validator("hf_token", mode="after", check_fields=False)
def validate_hf_token(cls, v):
    token = (v or "").strip() or os.environ.get("HF_TOKEN", "").strip()
    if not token:
        raise ValueError(
            "cmd_args.hf_token is required. Please set HF_TOKEN environment variable ..."
        )
    return token

However, in Megatron-Bridge's argument_parser.py, --hf_token is optional with no default — it is only required for gated models. The mismatch means cloudai enforces a requirement that Megatron-Bridge itself does not.

Steps to Reproduce

  1. Add a MegatronBridge test TOML without hf_token (or with hf_token = "") for a non-gated model
  2. Run pytest tests/test_test_definitions.py without HF_TOKEN set in the environment
  3. Observe ValidationError: cmd_args.hf_token is required
# conf/.../test/my_model.toml
[cmd_args]
model_family_name = "qwen"
model_recipe_name = "qwen3_235b_a22b"
compute_dtype = "bf16"
# no hf_token — model is not gated
pydantic_core._pydantic_core.ValidationError: 1 validation error for MegatronBridgeTestDefinition
cmd_args.hf_token
  Value error, cmd_args.hf_token is required. Please set HF_TOKEN environment variable (recommended)
  or cmd_args.hf_token with your actual HF token value.

Expected Behavior

hf_token should be optional. When empty, it should be passed as None to Megatron-Bridge (which already handles None gracefully). The validator should not raise for non-gated models.

Suggested Fix

Make hf_token optional — pass it through as-is when empty rather than raising:

hf_token: str | None = Field(default=None)

@field_validator("hf_token", mode="after")
def validate_hf_token(cls, v):
    token = (v or "").strip() or os.environ.get("HF_TOKEN", "").strip()
    return token or None  # None is valid — Megatron-Bridge handles it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions