Conversation
There was a problem hiding this comment.
Pull request overview
This pull request significantly enhances the ASF (Algorithm Selection Framework) with major documentation additions, code refactoring, and CLI improvements. Despite the title "CLI Improvement" and description "Just 2 small fixes", this PR contains extensive changes across 11 files.
Changes:
- Comprehensive documentation for all algorithm selectors (789 lines covering 19 selectors with detailed descriptions, parameters, and usage guidance)
- Major refactoring of the SNNAP selector from k-NN with Euclidean distance to a Jaccard-distance-based approach using per-algorithm regression models
- CLI enhancements including ASlib scenario support, proper boolean flag handling, and improved data alignment
- API improvements including parameter renaming in SATzilla for clarity, random_state addition to SurvivalAnalysis, type hint corrections, and fallback handling in ISAC
Reviewed changes
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Adds navigation entry for the new selectors documentation page |
| docs/features/selectors.md | New comprehensive documentation covering all 19 algorithm selectors with parameters, workflows, and usage guidance |
| asf/selectors/survival_analysis.py | Adds configurable random_state parameter for differential evolution optimization |
| asf/selectors/snnap.py | Major refactoring: replaces k-NN with Euclidean distance with Jaccard distance on predicted top-n algorithms; uses per-algorithm regression models on z-score normalized performance |
| asf/selectors/satzilla.py | Renames parameter from model_class to label_classifier_model and adds epm_regressor_model for clarity; updates hyperparameter definitions |
| asf/selectors/rpc_selector.py | Adds type hint Any to **kwargs parameter |
| asf/selectors/parallel_portfolio_selector.py | Adds type hint Any to **kwargs parameter |
| asf/selectors/meta_selector.py | Adds comprehensive docstring explaining parameters and requirements |
| asf/selectors/isac.py | Adds global fallback algorithm to handle edge cases where clusters have no valid performance data |
| asf/selectors/hybrid_decision_tree.py | Adds type hint Any to **kwargs parameter and imports Any from typing |
| asf/selectors/abstract_epm_based_selector.py | Simplifies parameter filtering logic by removing unused em_* and use_log10 parameters |
| asf/cli/cli_train.py | Major improvements: adds ASlib scenario support, fixes boolean flag handling with mutually exclusive groups, adds preprocessor/presolver choices, implements data alignment checks, adds random_state parameter for tuning |
Comments suppressed due to low confidence (5)
asf/cli/cli_train.py:131
- The maximize argument handling has changed from accepting a boolean type to using mutually exclusive action flags. However, the default is now None for both --maximize and --minimize, which means when neither flag is provided, the maximize variable will be None. Later in the code (line 387 and 402), there's logic to handle this None case, but it might be clearer to set a proper default value or document this behavior more explicitly to avoid confusion.
maximize_group = parser.add_mutually_exclusive_group()
maximize_group.add_argument(
"--maximize",
action="store_true",
default=None,
help="Maximize the objective",
)
maximize_group.add_argument(
"--minimize",
action="store_false",
dest="maximize",
default=None,
help="Minimize the objective (default)",
)
asf/cli/cli_train.py:378
- The error message should be updated from "--feature-data and --performance-data are required" to something like "--feature-data and --performance-data are required when --aslib-scenario is not provided" to be more accurate about when these arguments are needed.
if args.feature_data is None or args.performance_data is None:
parser.error("--feature-data and --performance-data are required")
asf/cli/cli_train.py:51
- The preprocessor_list construction may include private or internal sklearn classes that start with underscores or are not intended for direct use. Consider adding a filter to exclude names starting with underscore (e.g.,
and not name.startswith('_')) to avoid exposing internal implementation details.
preprocessor_list = sorted(
name
for name in dir(preprocessing)
if isinstance(getattr(preprocessing, name), type)
and issubclass(getattr(preprocessing, name), TransformerMixin)
)
asf/cli/cli_train.py:446
- After aligning features and performance DataFrames to their common index (lines 445-446), the features_running_time DataFrame (if it exists from ASlib scenario) may still have the original, unaligned index. This could cause index mismatch issues when features_running_time is passed to tune_selector. Consider also aligning features_running_time after the index alignment: if features_running_time is not None: features_running_time = features_running_time.loc[common_index]
if not features.index.equals(performance.index):
common_index = features.index.intersection(performance.index)
if common_index.empty:
parser.error(
"features and performance indices do not overlap; check your input files"
)
print(
"[WARN] Aligning features/performance indices to shared instances;"
f" kept {len(common_index)} rows"
)
features = features.loc[common_index]
performance = performance.loc[common_index]
asf/cli/cli_train.py:426
- If presolver_names is not empty but presolver_budget is 0 (which can happen when presolver_ratio is 0 or very small), the division on line 422 could result in each presolver having a budget of 0. While this doesn't cause a runtime error, it's semantically incorrect to initialize presolvers with zero budget. Consider checking if presolver_budget > 0 before creating presolver instances, or skipping presolver instantiation when the budget is insufficient.
presolver_classes = []
if presolver_names:
presolver_per_budget = presolver_budget / len(presolver_names)
presolver_classes = [
getattr(presolving, name)(budget=presolver_per_budget)
for name in presolver_names
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## main #146 +/- ##
==========================================
+ Coverage 84.02% 84.03% +0.01%
==========================================
Files 91 91
Lines 7644 7643 -1
==========================================
Hits 6423 6423
+ Misses 1221 1220 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Just 2 small fixes