shashankv-add-grpo-workplace-asst vs base-20251213 by pjin-nvidia · Pull Request #749 · NVIDIA-NeMo/Gym

pjin-nvidia · 2026-02-23T22:13:39Z

No description provided.

During the collate_samples step in ng_prepare_data, aggregate metrics were missing metadata fields (name, type, jsonl_fpath, num_repeats, gitlab_identifier, license) that were present in metrics from the validate_samples_and_aggregate_metrics step. This caused validation to fail with 'Missing key in new metrics' errors. Root cause: - validate_samples_and_aggregate_metrics merges dataset config metadata into metrics using: aggregate_metrics_dict = d.model_dump() | aggregate_metrics_dict - collate_samples only called aggregate_metrics.model_dump() without merging metadata, causing the validation comparison to fail Fix: Add metadata extraction and merging in collate_samples, similar to the validate step. For each dataset type, extract metadata from the first dataset configuration and merge it with the aggregate metrics before validation. This ensures consistency between validation and collation metrics formats, allowing ng_prepare_data to complete successfully. Signed-off-by: Shashank Verma <shashank3959@gmail.com>

…ce Assistant Add comprehensive tooling and documentation for GRPO training with the Workplace Assistant environment: 1. download_workplace_assistant.py: - Downloads nvidia/Nemotron-RL-agent-workplace_assistant from HuggingFace - Performs stratified 90/10 train/validation split based on task categories - Creates train.jsonl (1129 samples) and validation.jsonl (126 samples) - Includes file existence checks to skip redundant downloads 2. TRAINING.md: - Complete step-by-step guide from setup to production training - Dataset preparation instructions (HuggingFace download + ng_prepare_data) - Three training configurations: * test_001: Quick 5-step sanity check * test_002: Full 1-epoch training (36 steps, single node) * Production: Multi-node training with default config - Model configuration details (Qwen3-4B-Instruct with Hermes tool parser) - Monitoring and troubleshooting guidance - Performance optimization tips Benefits: - Enables easy reproduction of workplace_assistant GRPO experiments - Provides clear onboarding for new users - Documents best practices for agentic tool-use RL training - Reduces setup time from hours to minutes Signed-off-by: Shashank Verma <shashank3959@gmail.com>

- Step-by-step guide for training with NeMo RL and NeMo Gym - Covers environment setup, data preparation, and training configuration - Includes single-node and multi-node training instructions - Documents expected results and troubleshooting tips

- Add actual model input format with system/user messages and tool schemas - Include real single-step and multi-step task examples from dataset - Improve prerequisites, hardware requirements, and data preparation sections - Add HuggingFace dataset link and clarify download/split process

- Remove separate env vars step, inline them in training commands - Update Slurm examples to use NGC container (nvcr.io/nvidia/nemo-rl:v0.4.0) - Simplify multi-node section (remove separate submit script) - Clarify interactive mode runs inside container, batch mode from login node - Add BFCL v3 evaluation section with NeMo Evaluator reference - Add placeholder for validation reward plot

- Add comprehensive environment architecture explanation with state-matching verification - Simplify and reorganize time estimates (overall + per-step) - Streamline tool categories with examples instead of exhaustive lists - Consolidate dataset information into flowing prose - Add note that W&B is optional but recommended - Update prerequisites with clearer technical requirements - Fix and validate all code examples against actual implementation

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

copy-pr-bot · 2026-02-23T22:13:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

shashank3959 added 11 commits December 11, 2025 00:46

Minor updates to the documentation

d4e9a6a

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

Minor docs fixes

6287240

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

Updates to docs

0d0535d

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

Update tutorial: Replace Penguin references with Gym/NeMo-Gym naming

000307f

Update training command

c3ff817

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shashankv-add-grpo-workplace-asst vs base-20251213#749

shashankv-add-grpo-workplace-asst vs base-20251213#749
pjin-nvidia wants to merge 11 commits intopjin/shashankv-add-grpo-workplace-asst-base-20251213from
dev/shashankv-add-grpo-workplace-asst

pjin-nvidia commented Feb 23, 2026

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pjin-nvidia commented Feb 23, 2026

Uh oh!

copy-pr-bot bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants