Skip to content

[AUTO-MERGE] Enable training from pre-trained model weights (fine-tuning support)#752

Merged
drewoldag merged 8 commits intomainfrom
copilot/enable-pretrained-model-loading
Mar 6, 2026
Merged

[AUTO-MERGE] Enable training from pre-trained model weights (fine-tuning support)#752
drewoldag merged 8 commits intomainfrom
copilot/enable-pretrained-model-loading

Conversation

Copy link
Contributor

Copilot AI commented Mar 4, 2026

Change Description

Adds model_weights_file to the [train] config to allow loading pre-trained weights before training begins — enabling fine-tuning and transfer learning workflows. This is distinct from resume, which restores a full checkpoint (optimizer state, epoch counter).

Solution Description

  • hyrax_default_config.toml — Added model_weights_file = false directly after the resume key in [train], with a concise 2-line comment clarifying its purpose and mutual exclusivity with resume. The key is placed adjacent to resume to make their relationship clear.

  • train.py — Two additions to Train.run():

    • Early ValueError if both resume and model_weights_file are set simultaneously (fails before any dataset loading or directory creation).
    • Calls existing load_model_weights(config, model, "train") after setup_model() but before create_trainer() — critical to avoid key mismatches from idist.auto_model wrapping. The two colorama-styled log lines confirming the weights path and fine-tuning mode are emitted after load_model_weights succeeds, ensuring no misleading output if loading fails.
  • test_train.py — Three new tests: conflict raises ValueError, successful fine-tuning run, default value is False.

Usage:

h = Hyrax()
h.config["train"]["model_weights_file"] = "/path/to/pretrained_weights.pth"
h.train()  # Loads weights, starts from epoch 1 with fresh optimizer

Setting both resume and model_weights_file raises immediately:

ValueError: Cannot set both `resume` and `model_weights_file` in the [train] config. ...

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation
Original prompt

This section details on the original issue you should resolve

<issue_title>Enable training starting with a pre-trained model</issue_title>
<issue_description>Feature request
The user should be able to provide the path to a model weights file via the config that Hyrax would instruct Hyrax to load the model weights using the existing load_model_weights function.

The key in the config should be under the train table and should be called model_weights_file.
It's value should be false by default.
It should have a descriptive comment that states that if it's value is not false, it should be a path to a model weights file.

An error should be raised quickly if the user specifies both a resume (checkpoint path) in the config and a model_weights_file in the config.

There should be some logging associated with using a pre-trained weights file if one is provided to ensure that the user is aware of what is happening. </issue_description>

<agent_instructions>@copilot the second comment in the issue is the plan created by the PM agent. Please follow that plan for implementing this feature.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@drewoldag ## Plan: Add `model_weights_file` to `[train]` Config

TL;DR: Add a model_weights_file key to the [train] config section so users can load pre-trained model weights (for fine-tuning / transfer learning) separately from full checkpoint resuming (resume). The two keys serve distinct purposes—resume restores the full training state (optimizer, scheduler, epoch counter), while model_weights_file loads only model parameters and starts training fresh. An early ValueError prevents setting both simultaneously.

Steps

  1. Add config key in src/hyrax/hyrax_default_config.toml — insert model_weights_file = false in the [train] table (after the existing resume entry). Comment should read something like:

    # Path to a pre-trained model weights file for fine-tuning. Training will start
    # from epoch 1 with a fresh optimizer, using these weights as the initial parameters.
    # If `false`, training starts from randomly initialized weights (default behavior).
    # Cannot be used together with `resume` — use `resume` to continue from a full checkpoint.
    
  2. Add early validation in src/hyrax/verbs/train.py — at the top of Train.run(), immediately after config = self.config, add a check: if both config["train"]["resume"] and config["train"]["model_weights_file"] are truthy, raise a ValueError with a clear message explaining the difference and asking the user to pick one. This runs before any expensive dataset loading or directory creation.

  3. Load pre-trained weights in src/hyrax/verbs/train.py — after model = setup_model(config, dataset["train"]) (line ~72) and before create_trainer(...) (line ~141). If config["train"]["model_weights_file"] is truthy, call the existing load_model_weights(config, model, "train") from src/hyrax/models/model_utils.py. This must happen before create_trainer because create_trainer wraps the model with idist.auto_model (distributed wrapper), which can alter parameter key names. Loading weights into the un-wrapped model avoids key mismatches.

  4. Add logging — in Train.run(), after the load_model_weights call succeeds, log a colorama-styled message (matching the existing pattern like {Style.BRIGHT}{Fore.BLACK}{Back.GREEN}...{Style.RESET_ALL}) saying something like:

    Loading pre-trained weights: /path/to/weights.pth
    Training will start from epoch 1 with a fresh optimizer (fine-tuning mode).

    This makes it unmistakable to the user what's happening and how it differs from resume.

  5. Handle the load_model_weights fallback behavior — the existing function in model_utils.py falls back to auto-discovering the most recent training results when model_weights_file is falsy. Since step 3 only calls the function when the value is truthy, this fallback won't trigger. No changes to load_model_weights are needed.

  6. Add tests in tests/hyrax/test_train.py — three new test functions:

    • test_train_raises_on_resume_and_model_weights_file — set both config["train"]["resume"] and config["train"]["model_weights_file"] to non-false values, call h.train(), assert ValueError is raised with an appropriate message. Follow the existing loopback_hyrax fixture pattern.
    • test_train_with_pretrained_weights — run a first training to produce a weights file, then set config["train"]["model_weights_file"] to that weights file path, run training again, assert it completes successfully. Verify the model was loaded from the sp...

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Copilot AI changed the title [WIP] Enable training with pre-trained model weights Enable training from pre-trained model weights (fine-tuning support) Mar 4, 2026
Copy link
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I left a couple of comments. Please resolve them then request another review.

Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 5, 2026

@copilot I left a couple of comments. Please resolve them then request another review.

Both comments have been resolved in c92c118downloadCutout.py was reverted and the model_weights_file config comment was shortened to 2 lines and moved to be adjacent to the resume key.

Copy link
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@drewoldag drewoldag marked this pull request as ready for review March 5, 2026 00:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds fine-tuning / transfer learning support by introducing a model_weights_file key under the [train] config section. Users can now point to pre-trained weights that will be loaded before training begins (using only model parameters, not optimizer state), which is distinct from resume (full checkpoint restore). An early ValueError is raised if both are set simultaneously.

Changes:

  • New model_weights_file = false config key in [train] with descriptive comment
  • Mutual-exclusivity validation and conditional weight loading (before create_trainer) added to Train.run()
  • Three new tests covering the conflict error, a full fine-tuning run, and the default value

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/hyrax/hyrax_default_config.toml Adds model_weights_file = false key with comment in [train] section
src/hyrax/verbs/train.py Adds conflict validation, conditional weight loading with colorama logging in Train.run()
tests/hyrax/test_train.py Adds three tests: conflict raises ValueError, successful fine-tuning run, default value is False

You can also share your feedback on Copilot code review. Take the survey.

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.70%. Comparing base (56390c8) to head (18a7cc0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #752      +/-   ##
==========================================
+ Coverage   64.66%   64.70%   +0.04%     
==========================================
  Files          61       61              
  Lines        5881     5888       +7     
==========================================
+ Hits         3803     3810       +7     
  Misses       2078     2078              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Before [56390c8] After [a430b28] Ratio Benchmark (Parameter)
failed failed n/a data_cache_benchmarks.DataCacheBenchmarks.time_preload_cache_hsc1k
failed failed n/a data_cache_benchmarks.DataCacheBenchmarks.track_cache_hsc1k_hyrax_size_undercount
failed failed n/a data_request_benchmarks.DatasetRequestBenchmarks.time_request_all_data
435±4ms 447±1ms 1.03 vector_db_benchmarks.VectorDBSearchBenchmarks.time_search_by_vector_many_shards(128, 'qdrant')
39.8±0.5ms 40.4±0.4ms 1.02 benchmarks.time_nb_obj_dir
38.5±0.6ms 38.9±0.9ms 1.01 benchmarks.time_nb_obj_construct
1.95±0.01s 1.96±0.02s 1.01 benchmarks.time_umap_help
3.78G 3.81G 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(16384, 'qdrant')
24.6±0.04s 24.8±0s 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(16384, 'chromadb')
1.96±0.02s 1.97±0.02s 1.00 benchmarks.time_download_help

Click here to view all benchmarks.

Copy link
Collaborator

@aritraghsh09 aritraghsh09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simple, clean implementation!

@drewoldag drewoldag enabled auto-merge (squash) March 6, 2026 06:19
@drewoldag drewoldag changed the title Enable training from pre-trained model weights (fine-tuning support) [AUTO-MERGE] Enable training from pre-trained model weights (fine-tuning support) Mar 6, 2026
@drewoldag drewoldag merged commit 4ab8737 into main Mar 6, 2026
6 of 7 checks passed
@drewoldag drewoldag deleted the copilot/enable-pretrained-model-loading branch March 6, 2026 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable training starting with a pre-trained model

4 participants