Feature/nvidia if bench validators integrations#801
Open
abubakaria56 wants to merge 37 commits intoNVIDIA-NeMo:mainfrom
Open
Feature/nvidia if bench validators integrations#801abubakaria56 wants to merge 37 commits intoNVIDIA-NeMo:mainfrom
abubakaria56 wants to merge 37 commits intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
…flagging Flagging validation issues and writing them into error.json Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
…lti-lang-support [IFTL-218] Multi-Lang Support Signed-off-by: Dhrutisundar Sahoo <dhrutisundar.sahoo@turing.com>
Signed-off-by: qasimo-debug <qasim.o@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Signed-off-by: qasimo-debug <qasim.o@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Signed-off-by: qasimo-debug <qasim.o@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Signed-off-by: qasimo-debug <qasim.o@turing.com>
…r-turing/Nvidia-gym-turing into fixes/lang_validator Signed-off-by: qasimo-debug <qasim.o@turing.com>
Signed-off-by: qasimo-debug <qasim.o@turing.com>
…validator Fixes/lang validator Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Remove the "Where Do Reward Scores Come From?" note that implied custom verification logic is optional. Also fix tutorial goals to match actual content and correct the resource server name. Fixes NVIDIA-NeMo#776 Signed-off-by: Chris Wing <cwing@nvidia.com>
change tutorial card est time from 45-90 to 30 mins as in the tutorial itself NVIDIA-NeMo#780 Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com>
…t tutorials section (NVIDIA-NeMo#785) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: bxyu-nvidia <bxyu@nvidia.com>
5927179 Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
- Extract _preprocess_rows_from_config from duplicate run_from_config - Add missing imports: json, deepcopy, Union, Literal - Add return results to run_from_config - Remove dead _post_coroutine block (undefined server_client reference) Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com> Made-with: Cursor Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com> Made-with: Cursor
…alidators-integrations
…rtifacts - Remove # pragma: no cover from RolloutCollectionHelper class - Drop stale how_to_start.md entry from .gitignore - Delete tracked example_rollouts.jsonl generated artifact Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com> Made-with: Cursor
Auto-generated by update-readme-table pre-commit hook. Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com> Made-with: Cursor
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com> Made-with: Cursor
Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com> Made-with: Cursor
…s.json Rollouts that return a schema_validation or language_compatibility error are now excluded from results.jsonl, reward profiling, and agent metrics. TuringVIFVerifyResponse gains a should_skip_rollout flag (set to True on those two error paths) which rollout_collection.py reads to route the result into errors.json instead of the main output. On resume_from_cache, already-errored rollout keys are loaded from errors.json so they are not re-run. The finish summary now prints successful vs skipped counts. Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
e6071f1 to
14a8a9b
Compare
…th inline comments Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
…s_rows_from_config Signed-off-by: Al-Waasiu Abubakari <abubakari.a@turing.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Turing VIF
A validation service in the NeMo Gym pipeline that answers: Did the model’s reply follow all the instructions? It does not generate text; it only grades one reply against a list of instructions and returns pass/fail and a reward (0 or 1) for training.
Two Kinds of Validators
Multi-language support
Turing VIF supports multiple languages (English (en), Spanish (es), French (fre), German (de), Italian (it), Brazilian Portuguese (pt-BR), Japanese (ja), Chinese (zh), Korean (ko), Hindi (hi), Arabic (ar)). Language is set per request.
What Happens When It Runs
Results, errors, and skipping rollouts
reward(1.0 or 0.0)follow_all_instructionsfollow_instruction_list(one boolean per instruction)validation_results(per instruction: ID, status Passed/Failed/Skipped, message)Training uses
rewardand the list;validation_resultsare for debugging and evals.validation_resultswith status Failed → rollout skipped (not used for training, recorded e.g. inerrors.json).Per-instruction failures (valid request, some instructions fail) → reward 0,
validation_resultsPassed/Failed per instruction → rollout not skipped (still used for training with reward 0).Why
rollout_collection.pywas changedThe original breaks when a rollout returns a
schema_validationerror (malformed instruction payload) or alanguage_compatibilityerror (instruction not supported for the requested language). Without sorting input rows by(task_index, rollout_index)before writing the materialized-inputs file, index assignments become inconsistent when rollouts are skipped — causingresume_from_cacheto re-run completed rollouts or skip pending ones. The updated version sorts rows upfront so indices are stable, skipped rollouts are cleanly excluded from training, and errors are flagged in the verify response (should_skip_rollout=True) so the collector writes them toerrors.jsoninstead of the output JSONL — keepingresults.jsonl,_reward_profiling.jsonl, and_agent_metrics.jsonlimited to successful rollouts only.Known test failures in
responses_api_agents/swe_agentsTwo pre-existing issues unrelated to Turing VIF:
test_find_container_lowercase_match:_find_containerreturns the mixed-case constructed path instead of the actual lowercase filename on disk after applying the__→_1776_substitution.TestSetupSwebenchEnvironment,TestSetupR2eGymEnvironment,TestSetupOpenhandsEnvironment(test_already_exists): macOS-only failure —tempfile.TemporaryDirectory()returns/var/folders/...but the setup functions resolve symlinks, returning/private/var/folders/..., so the path equality check fails.