Skip to content

Conversation

@marcellodebernardi
Copy link
Contributor

This PR updates the plexe project with the latest version of Plexe's model builder, which was developed and tested internally over the last few months. This newer version of model builder differs from plexe version 0.26.2 in significant ways. The main differences are:

  1. While we still distribute this as a Python package, we no longer think of this as a "library" consisting of reusable programming primitives such as Model, ModelBuilder, etc. The primary usage pattern for this new version of plexe is to run the main module to start the model build.
  2. The implementation has been completely overhauled. Instead of a supervisor agent coordinating a team of agents, we now use a pre-defined workflow to orchestrate agents that work on individual tasks. Furthermore, the training and inference code used for the ML models we train is no longer fully generated by LLMs; instead, we have "templates" for specific model types, with the AI agents only needing to "fill in the details". This change has helped make the application significantly more robust by being less vulnerable to hallucinations.
  3. The final product of the model building process is a "model package" that can be used anywhere, and doesn't depend on the plexe package at all. Build the model with plexe, use it wherever you want.

The PR is too large for code review, but this is fine. The code is being directly moved from an internal repo after several months of internal use.

Copilot AI review requested due to automatic review settings February 8, 2026 20:45
@greptile-apps
Copy link

greptile-apps bot commented Feb 8, 2026

Too many files changed for review. (288 files found, 100 file limit)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the prior “library-style” model builder internals with a new guardrailed, workflow-oriented implementation, and adds new integration/storage building blocks (notably S3) plus updated examples and containerization.

Changes:

  • Removes legacy internal agents/executors/validators/dataset utilities in favor of the new workflow-based architecture.
  • Adds workflow integration interfaces and cloud storage helper abstractions (with an S3 implementation and Azure/GCS stubs).
  • Adds checkpointing utilities, new agents, new constants, updated docs/examples, and a multi-stage Dockerfile.

Reviewed changes

Copilot reviewed 120 out of 288 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
plexe/internal/schemas/resolver.py Removed legacy schema resolver implementation
plexe/internal/models/validation/validator.py Removed legacy validation primitives
plexe/internal/models/validation/primitives/syntax.py Removed legacy syntax validator
plexe/internal/models/validation/primitives/security.py Removed legacy security validator stub
plexe/internal/models/validation/primitives/predict.py Removed legacy predictor runtime validator
plexe/internal/models/validation/composites/training.py Removed legacy composite validator
plexe/internal/models/validation/composites/inference.py Removed legacy composite validator
plexe/internal/models/validation/composites/init.py Removed legacy exports
plexe/internal/models/validation/composite.py Removed legacy composite validator
plexe/internal/models/generation/training.py Removed legacy code generation path
plexe/internal/models/generation/review.py Removed legacy model reviewer
plexe/internal/models/generation/planning.py Removed legacy solution planning generator
plexe/internal/models/execution/ray_executor.py Removed legacy Ray executor
plexe/internal/models/execution/process_executor.py Removed legacy process executor
plexe/internal/models/execution/executor.py Removed legacy executor interface/result
plexe/internal/models/execution/docker_executor.py Removed legacy docker executor stub
plexe/internal/models/entities/metric.py Removed legacy Metric/Comparator types
plexe/internal/models/entities/description.py Removed legacy model description types
plexe/internal/models/entities/code.py Removed legacy Code entity
plexe/internal/models/entities/artifact.py Removed legacy Artifact entity
plexe/internal/models/callbacks/checkpoint.py Removed legacy checkpoint callback
plexe/internal/models/callbacks/chain_of_thought.py Removed legacy chain-of-thought callback
plexe/internal/datasets/generator.py Removed legacy dataset generator
plexe/internal/datasets/core/validation/eda.py Removed legacy EDA validator notebook flow
plexe/internal/datasets/core/validation/base.py Removed legacy dataset validator base
plexe/internal/datasets/core/generation/base.py Removed legacy data generator base
plexe/internal/datasets/config.py Removed legacy dataset config
plexe/internal/datasets/init.py Removed legacy dataset service entrypoint
plexe/internal/common/utils/response.py Removed legacy LLM response helpers
plexe/internal/common/utils/pydantic_utils.py Removed legacy pydantic helpers
plexe/internal/common/utils/prompt_utils.py Removed legacy prompt helpers
plexe/internal/common/utils/pandas_utils.py Removed legacy pandas dtype helpers
plexe/internal/common/utils/model_utils.py Removed legacy model utils
plexe/internal/common/utils/model_state.py Removed legacy model state enum
plexe/internal/common/utils/markdown_utils.py Removed legacy markdown report utilities
plexe/internal/common/utils/dependency_utils.py Removed legacy optional-deps decorator
plexe/internal/common/utils/dataset_storage.py Removed legacy dataset storage/shared-mem utilities
plexe/internal/common/utils/chain_of_thought/protocol.py Removed legacy chain-of-thought protocol
plexe/internal/common/utils/chain_of_thought/emitters.py Removed legacy chain-of-thought emitters
plexe/internal/common/utils/chain_of_thought/callable.py Removed legacy chain-of-thought callable
plexe/internal/common/utils/chain_of_thought/adapters.py Removed legacy chain-of-thought adapters
plexe/internal/common/utils/chain_of_thought/init.py Removed legacy chain-of-thought package exports
plexe/internal/common/utils/agents.py Removed legacy prompt template merge helper
plexe/internal/common/utils/init.py Removed legacy utils package docstring
plexe/internal/common/provider.py Removed legacy Provider implementation
plexe/internal/common/datasets/tabular.py Removed legacy TabularDataset
plexe/internal/common/datasets/interface.py Removed legacy dataset interfaces
plexe/internal/common/datasets/adapter.py Removed legacy DatasetAdapter
plexe/integrations/storage/s3.py Added S3 storage helper implementation
plexe/integrations/storage/gcs.py Added GCS helper stub
plexe/integrations/storage/azure.py Added Azure Blob helper stub
plexe/integrations/storage/init.py Added StorageHelper interface
plexe/integrations/base.py Added WorkflowIntegration interface
plexe/fileio.py Removed legacy file I/O API
plexe/execution/training/runner.py Added TrainingRunner interface
plexe/execution/dataproc/init.py Added dataproc package marker
plexe/execution/init.py Added execution package marker
plexe/datasets.py Removed legacy DatasetGenerator API
plexe/core/state.py Removed legacy ModelState enum (core)
plexe/core/object_registry.py Removed legacy object registry
plexe/core/interfaces/predictor.py Removed legacy Predictor interface
plexe/core/interfaces/feature_transformer.py Removed legacy FeatureTransformer interface
plexe/core/entities/solution.py Removed legacy Solution entity
plexe/core/init.py Removed legacy core package docstring
plexe/constants.py Added centralized constants for workflow
plexe/checkpointing.py Added local-only checkpointing utilities
plexe/callbacks.py Removed legacy callback API
plexe/agents/utils.py Added shared agent utilities
plexe/agents/statistical_analyser.py Added statistical profiling agent
plexe/agents/schema_resolver.py Removed legacy schema resolver agent
plexe/agents/sampler.py Added intelligent sampling agent
plexe/agents/model_trainer.py Removed legacy model trainer agent
plexe/agents/model_tester.py Removed legacy model tester agent
plexe/agents/model_planner.py Removed legacy model planner agent
plexe/agents/model_packager.py Removed legacy model packager agent
plexe/agents/ml_task_analyser.py Added ML task analysis agent
plexe/agents/metric_selector.py Added metric selection agent
plexe/agents/metric_implementer.py Added metric implementation agent
plexe/agents/layout_detector.py Added data layout detection agent
plexe/agents/insight_extractor.py Added insight extraction agent
plexe/agents/feature_engineer.py Removed legacy feature engineering agent
plexe/agents/dataset_analyser.py Removed legacy EDA agent
plexe/agents/conversational.py Removed legacy conversational agent
plexe/agents/init.py Removed legacy agents package docstring
plexe/init.py Removed legacy public exports
examples/spaceship_titanic.py Removed legacy example entrypoint
examples/santander_transactions.py Removed legacy example entrypoint
examples/local/spaceship_titanic.py Added updated local workflow example
examples/house_prices.py Removed legacy example entrypoint
examples/dataset_generation.py Removed legacy synthetic dataset example
examples/dataset_augmentation.py Removed legacy augmentation example
Dockerfile Added multi-stage Dockerfile for local Spark + Databricks
CLAUDE.md Updated repository guidance for new workflow
.dockerignore Added dockerignore for cleaner builds

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@marcellodebernardi marcellodebernardi merged commit 8dc6dc2 into main Feb 9, 2026
12 checks passed
@marcellodebernardi marcellodebernardi deleted the feature/model-builder-v2 branch February 9, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant