Skip to content

shashankv-add-grpo-workplace-asst vs base-20251213#749

Draft
pjin-nvidia wants to merge 11 commits intopjin/shashankv-add-grpo-workplace-asst-base-20251213from
dev/shashankv-add-grpo-workplace-asst
Draft

shashankv-add-grpo-workplace-asst vs base-20251213#749
pjin-nvidia wants to merge 11 commits intopjin/shashankv-add-grpo-workplace-asst-base-20251213from
dev/shashankv-add-grpo-workplace-asst

Conversation

@pjin-nvidia
Copy link
Contributor

No description provided.

During the collate_samples step in ng_prepare_data, aggregate metrics were
missing metadata fields (name, type, jsonl_fpath, num_repeats,
gitlab_identifier, license) that were present in metrics from the
validate_samples_and_aggregate_metrics step. This caused validation to fail
with 'Missing key in new metrics' errors.

Root cause:
- validate_samples_and_aggregate_metrics merges dataset config metadata into
  metrics using: aggregate_metrics_dict = d.model_dump() | aggregate_metrics_dict
- collate_samples only called aggregate_metrics.model_dump() without merging
  metadata, causing the validation comparison to fail

Fix:
Add metadata extraction and merging in collate_samples, similar to the
validate step. For each dataset type, extract metadata from the first
dataset configuration and merge it with the aggregate metrics before
validation.

This ensures consistency between validation and collation metrics formats,
allowing ng_prepare_data to complete successfully.

Signed-off-by: Shashank Verma <shashank3959@gmail.com>
…ce Assistant

Add comprehensive tooling and documentation for GRPO training with the
Workplace Assistant environment:

1. download_workplace_assistant.py:
   - Downloads nvidia/Nemotron-RL-agent-workplace_assistant from HuggingFace
   - Performs stratified 90/10 train/validation split based on task categories
   - Creates train.jsonl (1129 samples) and validation.jsonl (126 samples)
   - Includes file existence checks to skip redundant downloads

2. TRAINING.md:
   - Complete step-by-step guide from setup to production training
   - Dataset preparation instructions (HuggingFace download + ng_prepare_data)
   - Three training configurations:
     * test_001: Quick 5-step sanity check
     * test_002: Full 1-epoch training (36 steps, single node)
     * Production: Multi-node training with default config
   - Model configuration details (Qwen3-4B-Instruct with Hermes tool parser)
   - Monitoring and troubleshooting guidance
   - Performance optimization tips

Benefits:
- Enables easy reproduction of workplace_assistant GRPO experiments
- Provides clear onboarding for new users
- Documents best practices for agentic tool-use RL training
- Reduces setup time from hours to minutes

Signed-off-by: Shashank Verma <shashank3959@gmail.com>
- Step-by-step guide for training with NeMo RL and NeMo Gym
- Covers environment setup, data preparation, and training configuration
- Includes single-node and multi-node training instructions
- Documents expected results and troubleshooting tips
- Add actual model input format with system/user messages and tool schemas
- Include real single-step and multi-step task examples from dataset
- Improve prerequisites, hardware requirements, and data preparation sections
- Add HuggingFace dataset link and clarify download/split process
- Remove separate env vars step, inline them in training commands
- Update Slurm examples to use NGC container (nvcr.io/nvidia/nemo-rl:v0.4.0)
- Simplify multi-node section (remove separate submit script)
- Clarify interactive mode runs inside container, batch mode from login node
- Add BFCL v3 evaluation section with NeMo Evaluator reference
- Add placeholder for validation reward plot
- Add comprehensive environment architecture explanation with state-matching verification
- Simplify and reorganize time estimates (overall + per-step)
- Streamline tool categories with examples instead of exhaustive lists
- Consolidate dataset information into flowing prose
- Add note that W&B is optional but recommended
- Update prerequisites with clearer technical requirements
- Fix and validate all code examples against actual implementation
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants