Replace dask_ml.wrappers.Incremental with custom Incremental class#855
Replace dask_ml.wrappers.Incremental with custom Incremental class#855ayushdg merged 14 commits intodask-contrib:mainfrom
dask_ml.wrappers.Incremental with custom Incremental class#855Conversation
dask_ml.wrappers.Incremental with custom Incremental classdask_ml.wrappers.Incremental with custom Incremental class
Codecov Report
@@ Coverage Diff @@
## main #855 +/- ##
==========================================
+ Coverage 75.20% 75.52% +0.32%
==========================================
Files 72 73 +1
Lines 3779 3972 +193
Branches 674 710 +36
==========================================
+ Hits 2842 3000 +158
- Misses 804 810 +6
- Partials 133 162 +29
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
| _assert_eq(l, r, name=attr, **kwargs) | ||
|
|
||
|
|
||
| def test_parallelpostfit_basic(): |
There was a problem hiding this comment.
For the unit tests, I added test_parallelpostfit_basic (originally test_it_works), test_predict, and test_transform from https://github.com/dask/dask-ml/blob/main/tests/test_parallel_post_fit.py, and test_incremental_basic from https://github.com/dask/dask-ml/blob/main/tests/test_incremental.py
charlesbluca
left a comment
There was a problem hiding this comment.
Thanks @sarahyurick! Code changes look good, just a couple remaining mentions of dask-ml in the affected files that could be removed (assuming the remaining uses will be handled in #886)
VibhuJawa
left a comment
There was a problem hiding this comment.
Have requested small clarifications, other wise implementation looks good.
charlesbluca
left a comment
There was a problem hiding this comment.
Thanks @sarahyurick!
| model with a :class:`dask_sql.physical.rel.custom.wrappers.ParallelPostFit`. | ||
| Have a look into the | ||
| [dask-ml docu](https://ml.dask.org/meta-estimators.html#parallel-prediction-and-transformation) | ||
| to learn more about it. Defaults to false. Typically you set | ||
| it to true for sklearn models if predicting on big data. | ||
| Defaults to false. Typically you set it to true for | ||
| sklearn models if predicting on big data. |
There was a problem hiding this comment.
Unrelated to this PR , can you file an issue to clean up the wrap_predict and wrap_fit arguments. I think we can get rid of this or do a better default based on the class name of the model.
For sklearn and single gpu cuML models, switch this to true else switch this to False.
| logger.info(tune_fit_kwargs) | ||
| search.fit(X, y, **tune_fit_kwargs) | ||
| search.fit( | ||
| X.to_dask_array(lengths=True), |
There was a problem hiding this comment.
Could experimentClass be a gpu based model or is it limited to cpu based ones only?
There was a problem hiding this comment.
I'm not sure, since I think every example I've seen with experiment_class has been with a CPU dask_ml model... I can try to get a better idea of the scope when the pytests in #886 are updated with other non-dask_ml models. We can see about adding GPU tests there too.
Closes #839.
In addition to #832, we want to create a custom implementation for Dask-ML's
Incrementalclass as well.So as not to create any merge conflicts, I've only added a single file relating to the scorers used in Dask-ML's implementation. After #832 I will add the remaining functionality and changes needed in
dask_sql/physical/rel/custom/create_model.py,dask_sql/physical/rel/custom/wrappers.py(created in #832), anddocs/source/sql/ml.rst.@VibhuJawa