-
Notifications
You must be signed in to change notification settings - Fork 253
feat: rewrite model builder to use guardrailed workflow #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Too many files changed for review. ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the prior “library-style” model builder internals with a new guardrailed, workflow-oriented implementation, and adds new integration/storage building blocks (notably S3) plus updated examples and containerization.
Changes:
- Removes legacy internal agents/executors/validators/dataset utilities in favor of the new workflow-based architecture.
- Adds workflow integration interfaces and cloud storage helper abstractions (with an S3 implementation and Azure/GCS stubs).
- Adds checkpointing utilities, new agents, new constants, updated docs/examples, and a multi-stage Dockerfile.
Reviewed changes
Copilot reviewed 120 out of 288 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| plexe/internal/schemas/resolver.py | Removed legacy schema resolver implementation |
| plexe/internal/models/validation/validator.py | Removed legacy validation primitives |
| plexe/internal/models/validation/primitives/syntax.py | Removed legacy syntax validator |
| plexe/internal/models/validation/primitives/security.py | Removed legacy security validator stub |
| plexe/internal/models/validation/primitives/predict.py | Removed legacy predictor runtime validator |
| plexe/internal/models/validation/composites/training.py | Removed legacy composite validator |
| plexe/internal/models/validation/composites/inference.py | Removed legacy composite validator |
| plexe/internal/models/validation/composites/init.py | Removed legacy exports |
| plexe/internal/models/validation/composite.py | Removed legacy composite validator |
| plexe/internal/models/generation/training.py | Removed legacy code generation path |
| plexe/internal/models/generation/review.py | Removed legacy model reviewer |
| plexe/internal/models/generation/planning.py | Removed legacy solution planning generator |
| plexe/internal/models/execution/ray_executor.py | Removed legacy Ray executor |
| plexe/internal/models/execution/process_executor.py | Removed legacy process executor |
| plexe/internal/models/execution/executor.py | Removed legacy executor interface/result |
| plexe/internal/models/execution/docker_executor.py | Removed legacy docker executor stub |
| plexe/internal/models/entities/metric.py | Removed legacy Metric/Comparator types |
| plexe/internal/models/entities/description.py | Removed legacy model description types |
| plexe/internal/models/entities/code.py | Removed legacy Code entity |
| plexe/internal/models/entities/artifact.py | Removed legacy Artifact entity |
| plexe/internal/models/callbacks/checkpoint.py | Removed legacy checkpoint callback |
| plexe/internal/models/callbacks/chain_of_thought.py | Removed legacy chain-of-thought callback |
| plexe/internal/datasets/generator.py | Removed legacy dataset generator |
| plexe/internal/datasets/core/validation/eda.py | Removed legacy EDA validator notebook flow |
| plexe/internal/datasets/core/validation/base.py | Removed legacy dataset validator base |
| plexe/internal/datasets/core/generation/base.py | Removed legacy data generator base |
| plexe/internal/datasets/config.py | Removed legacy dataset config |
| plexe/internal/datasets/init.py | Removed legacy dataset service entrypoint |
| plexe/internal/common/utils/response.py | Removed legacy LLM response helpers |
| plexe/internal/common/utils/pydantic_utils.py | Removed legacy pydantic helpers |
| plexe/internal/common/utils/prompt_utils.py | Removed legacy prompt helpers |
| plexe/internal/common/utils/pandas_utils.py | Removed legacy pandas dtype helpers |
| plexe/internal/common/utils/model_utils.py | Removed legacy model utils |
| plexe/internal/common/utils/model_state.py | Removed legacy model state enum |
| plexe/internal/common/utils/markdown_utils.py | Removed legacy markdown report utilities |
| plexe/internal/common/utils/dependency_utils.py | Removed legacy optional-deps decorator |
| plexe/internal/common/utils/dataset_storage.py | Removed legacy dataset storage/shared-mem utilities |
| plexe/internal/common/utils/chain_of_thought/protocol.py | Removed legacy chain-of-thought protocol |
| plexe/internal/common/utils/chain_of_thought/emitters.py | Removed legacy chain-of-thought emitters |
| plexe/internal/common/utils/chain_of_thought/callable.py | Removed legacy chain-of-thought callable |
| plexe/internal/common/utils/chain_of_thought/adapters.py | Removed legacy chain-of-thought adapters |
| plexe/internal/common/utils/chain_of_thought/init.py | Removed legacy chain-of-thought package exports |
| plexe/internal/common/utils/agents.py | Removed legacy prompt template merge helper |
| plexe/internal/common/utils/init.py | Removed legacy utils package docstring |
| plexe/internal/common/provider.py | Removed legacy Provider implementation |
| plexe/internal/common/datasets/tabular.py | Removed legacy TabularDataset |
| plexe/internal/common/datasets/interface.py | Removed legacy dataset interfaces |
| plexe/internal/common/datasets/adapter.py | Removed legacy DatasetAdapter |
| plexe/integrations/storage/s3.py | Added S3 storage helper implementation |
| plexe/integrations/storage/gcs.py | Added GCS helper stub |
| plexe/integrations/storage/azure.py | Added Azure Blob helper stub |
| plexe/integrations/storage/init.py | Added StorageHelper interface |
| plexe/integrations/base.py | Added WorkflowIntegration interface |
| plexe/fileio.py | Removed legacy file I/O API |
| plexe/execution/training/runner.py | Added TrainingRunner interface |
| plexe/execution/dataproc/init.py | Added dataproc package marker |
| plexe/execution/init.py | Added execution package marker |
| plexe/datasets.py | Removed legacy DatasetGenerator API |
| plexe/core/state.py | Removed legacy ModelState enum (core) |
| plexe/core/object_registry.py | Removed legacy object registry |
| plexe/core/interfaces/predictor.py | Removed legacy Predictor interface |
| plexe/core/interfaces/feature_transformer.py | Removed legacy FeatureTransformer interface |
| plexe/core/entities/solution.py | Removed legacy Solution entity |
| plexe/core/init.py | Removed legacy core package docstring |
| plexe/constants.py | Added centralized constants for workflow |
| plexe/checkpointing.py | Added local-only checkpointing utilities |
| plexe/callbacks.py | Removed legacy callback API |
| plexe/agents/utils.py | Added shared agent utilities |
| plexe/agents/statistical_analyser.py | Added statistical profiling agent |
| plexe/agents/schema_resolver.py | Removed legacy schema resolver agent |
| plexe/agents/sampler.py | Added intelligent sampling agent |
| plexe/agents/model_trainer.py | Removed legacy model trainer agent |
| plexe/agents/model_tester.py | Removed legacy model tester agent |
| plexe/agents/model_planner.py | Removed legacy model planner agent |
| plexe/agents/model_packager.py | Removed legacy model packager agent |
| plexe/agents/ml_task_analyser.py | Added ML task analysis agent |
| plexe/agents/metric_selector.py | Added metric selection agent |
| plexe/agents/metric_implementer.py | Added metric implementation agent |
| plexe/agents/layout_detector.py | Added data layout detection agent |
| plexe/agents/insight_extractor.py | Added insight extraction agent |
| plexe/agents/feature_engineer.py | Removed legacy feature engineering agent |
| plexe/agents/dataset_analyser.py | Removed legacy EDA agent |
| plexe/agents/conversational.py | Removed legacy conversational agent |
| plexe/agents/init.py | Removed legacy agents package docstring |
| plexe/init.py | Removed legacy public exports |
| examples/spaceship_titanic.py | Removed legacy example entrypoint |
| examples/santander_transactions.py | Removed legacy example entrypoint |
| examples/local/spaceship_titanic.py | Added updated local workflow example |
| examples/house_prices.py | Removed legacy example entrypoint |
| examples/dataset_generation.py | Removed legacy synthetic dataset example |
| examples/dataset_augmentation.py | Removed legacy augmentation example |
| Dockerfile | Added multi-stage Dockerfile for local Spark + Databricks |
| CLAUDE.md | Updated repository guidance for new workflow |
| .dockerignore | Added dockerignore for cleaner builds |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR updates the
plexeproject with the latest version of Plexe's model builder, which was developed and tested internally over the last few months. This newer version of model builder differs fromplexeversion0.26.2in significant ways. The main differences are:Model,ModelBuilder, etc. The primary usage pattern for this new version ofplexeis to run themainmodule to start the model build.plexepackage at all. Build the model withplexe, use it wherever you want.The PR is too large for code review, but this is fine. The code is being directly moved from an internal repo after several months of internal use.