shashankv-add-grpo-workplace-asst vs base-20251213#749
Draft
pjin-nvidia wants to merge 11 commits intopjin/shashankv-add-grpo-workplace-asst-base-20251213from
Draft
shashankv-add-grpo-workplace-asst vs base-20251213#749pjin-nvidia wants to merge 11 commits intopjin/shashankv-add-grpo-workplace-asst-base-20251213from
pjin-nvidia wants to merge 11 commits intopjin/shashankv-add-grpo-workplace-asst-base-20251213from
Conversation
During the collate_samples step in ng_prepare_data, aggregate metrics were missing metadata fields (name, type, jsonl_fpath, num_repeats, gitlab_identifier, license) that were present in metrics from the validate_samples_and_aggregate_metrics step. This caused validation to fail with 'Missing key in new metrics' errors. Root cause: - validate_samples_and_aggregate_metrics merges dataset config metadata into metrics using: aggregate_metrics_dict = d.model_dump() | aggregate_metrics_dict - collate_samples only called aggregate_metrics.model_dump() without merging metadata, causing the validation comparison to fail Fix: Add metadata extraction and merging in collate_samples, similar to the validate step. For each dataset type, extract metadata from the first dataset configuration and merge it with the aggregate metrics before validation. This ensures consistency between validation and collation metrics formats, allowing ng_prepare_data to complete successfully. Signed-off-by: Shashank Verma <shashank3959@gmail.com>
…ce Assistant
Add comprehensive tooling and documentation for GRPO training with the
Workplace Assistant environment:
1. download_workplace_assistant.py:
- Downloads nvidia/Nemotron-RL-agent-workplace_assistant from HuggingFace
- Performs stratified 90/10 train/validation split based on task categories
- Creates train.jsonl (1129 samples) and validation.jsonl (126 samples)
- Includes file existence checks to skip redundant downloads
2. TRAINING.md:
- Complete step-by-step guide from setup to production training
- Dataset preparation instructions (HuggingFace download + ng_prepare_data)
- Three training configurations:
* test_001: Quick 5-step sanity check
* test_002: Full 1-epoch training (36 steps, single node)
* Production: Multi-node training with default config
- Model configuration details (Qwen3-4B-Instruct with Hermes tool parser)
- Monitoring and troubleshooting guidance
- Performance optimization tips
Benefits:
- Enables easy reproduction of workplace_assistant GRPO experiments
- Provides clear onboarding for new users
- Documents best practices for agentic tool-use RL training
- Reduces setup time from hours to minutes
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
- Step-by-step guide for training with NeMo RL and NeMo Gym - Covers environment setup, data preparation, and training configuration - Includes single-node and multi-node training instructions - Documents expected results and troubleshooting tips
- Add actual model input format with system/user messages and tool schemas - Include real single-step and multi-step task examples from dataset - Improve prerequisites, hardware requirements, and data preparation sections - Add HuggingFace dataset link and clarify download/split process
- Remove separate env vars step, inline them in training commands - Update Slurm examples to use NGC container (nvcr.io/nvidia/nemo-rl:v0.4.0) - Simplify multi-node section (remove separate submit script) - Clarify interactive mode runs inside container, batch mode from login node - Add BFCL v3 evaluation section with NeMo Evaluator reference - Add placeholder for validation reward plot
- Add comprehensive environment architecture explanation with state-matching verification - Simplify and reorganize time estimates (overall + per-step) - Streamline tool categories with examples instead of exhaustive lists - Consolidate dataset information into flowing prose - Add note that W&B is optional but recommended - Update prerequisites with clearer technical requirements - Fix and validate all code examples against actual implementation
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: Shashank Verma <shashank3959@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.