-
Notifications
You must be signed in to change notification settings - Fork 6
Refactor SemOptimizer API #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
453fcc9
abc2847
56cdef1
421927e
84bd7bd
930e0e5
7bd1007
96f5f17
869429b
6d88590
3e2de1d
a560a01
8dacd43
06a9a13
519cff1
ce87dd4
51121fd
de9d2e8
2b42351
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,47 +1,36 @@ | ||||||
| # Using NLopt.jl | ||||||
|
|
||||||
| [`SemOptimizerNLopt`](@ref) implements the connection to `NLopt.jl`. | ||||||
| It is only available if the `NLopt` package is loaded alongside `StructuralEquationModels.jl` in the running Julia session. | ||||||
| It takes a bunch of arguments: | ||||||
| When [`NLopt.jl`](https://github.com/jump-dev/NLopt.jl) is loaded in the running Julia session, | ||||||
| it could be used by the [`SemOptimizer`](@ref) by specifying `engine = :NLopt` | ||||||
| (see [NLopt-specific options](@ref `SemOptimizer(Val(:NLopt))`)). | ||||||
| Among other things, `NLopt` enables constrained optimization of the SEM models, which is | ||||||
| explained in the [Constrained optimization](@ref) section. | ||||||
|
|
||||||
| ```julia | ||||||
| • algorithm: optimization algorithm | ||||||
|
|
||||||
| • options::Dict{Symbol, Any}: options for the optimization algorithm | ||||||
|
|
||||||
| • local_algorithm: local optimization algorithm | ||||||
|
|
||||||
| • local_options::Dict{Symbol, Any}: options for the local optimization algorithm | ||||||
|
|
||||||
| • equality_constraints::Vector{NLoptConstraint}: vector of equality constraints | ||||||
|
|
||||||
| • inequality_constraints::Vector{NLoptConstraint}: vector of inequality constraints | ||||||
| ``` | ||||||
| Constraints are explained in the section on [Constrained optimization](@ref). | ||||||
|
|
||||||
| The defaults are LBFGS as the optimization algorithm and the standard options from `NLopt.jl`. | ||||||
| We can choose something different: | ||||||
| We can override the default *NLopt* algorithm (LFBGS) and instead use | ||||||
| the *augmented lagrangian* method with LBFGS as the *local* optimization algorithm, | ||||||
| stop at a maximum of 200 evaluations and use a relative tolerance of | ||||||
| the objective value of `1e-6` as the stopping criterion for the local algorithm: | ||||||
|
|
||||||
| ```julia | ||||||
| using NLopt | ||||||
|
|
||||||
| my_optimizer = SemOptimizerNLopt(; | ||||||
| my_optimizer = SemOptimizer(; | ||||||
| engine = :NLopt, | ||||||
| algorithm = :AUGLAG, | ||||||
| options = Dict(:maxeval => 200), | ||||||
| local_algorithm = :LD_LBFGS, | ||||||
| local_options = Dict(:ftol_rel => 1e-6) | ||||||
| ) | ||||||
| ``` | ||||||
|
|
||||||
| This uses an augmented lagrangian method with LBFGS as the local optimization algorithm, stops at a maximum of 200 evaluations and uses a relative tolerance of the objective value of `1e-6` as the stopping criterion for the local algorithm. | ||||||
|
|
||||||
| To see how to use the optimizer to actually fit a model now, check out the [Model fitting](@ref) section. | ||||||
|
|
||||||
| In the NLopt docs, you can find explanations about the different [algorithms](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/) and a [tutorial](https://nlopt.readthedocs.io/en/latest/NLopt_Introduction/) that also explains the different options. | ||||||
| In the *NLopt* docs, you can find details about the [optimization algorithms](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/), | ||||||
| and the [tutorial](https://nlopt.readthedocs.io/en/latest/NLopt_Introduction/) that demonstrates how to tweak their behavior. | ||||||
|
|
||||||
| To choose an algorithm, just pass its name without the 'NLOPT\_' prefix (for example, 'NLOPT\_LD\_SLSQP' can be used by passing `algorithm = :LD_SLSQP`). | ||||||
|
|
||||||
| The README of the [julia package](https://github.com/JuliaOpt/NLopt.jl) may also be helpful, and provides a list of options: | ||||||
| The README of the [*NLopt.jl*](https://github.com/JuliaOpt/NLopt.jl) may also be helpful, and provides a list of options: | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| - `algorithm` | ||||||
| - `stopval` | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -5,7 +5,9 @@ | |||||||||||
| For ridge regularization, you can simply use `SemRidge` as an additional loss function | ||||||||||||
| (for example, a model with the loss functions `SemML` and `SemRidge` corresponds to ridge-regularized maximum likelihood estimation). | ||||||||||||
|
|
||||||||||||
| For lasso, elastic net and (far) beyond, you can load the `ProximalAlgorithms.jl` and `ProximalOperators.jl` packages alongside `StructuralEquationModels`: | ||||||||||||
| You can define lasso, elastic net and other forms of regularization using [`ProximalOperators.jl`](https://github.com/JuliaFirstOrder/ProximalOperators.jl) | ||||||||||||
| and optimize the SEM model with [`ProximalAlgorithms.jl`](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) | ||||||||||||
| that provides so-called *proximal optimization* algorithms. | ||||||||||||
|
Comment on lines
+8
to
+10
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
|
|
||||||||||||
| ```@setup reg | ||||||||||||
| using StructuralEquationModels, ProximalAlgorithms, ProximalOperators | ||||||||||||
|
|
@@ -19,24 +21,22 @@ Pkg.add("ProximalOperators") | |||||||||||
| using StructuralEquationModels, ProximalAlgorithms, ProximalOperators | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ## `SemOptimizerProximal` | ||||||||||||
| ## Proximal optimization | ||||||||||||
|
|
||||||||||||
| To estimate regularized models, we provide a "building block" for the optimizer part, called `SemOptimizerProximal`. | ||||||||||||
| It connects our package to the [`ProximalAlgorithms.jl`](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) optimization backend, providing so-called proximal optimization algorithms. | ||||||||||||
| Those can handle, amongst other things, various forms of regularization. | ||||||||||||
|
|
||||||||||||
| It can be used as | ||||||||||||
| With *ProximalAlgorithms* package loaded, it is now possible to use `:Proximal` optimization engine | ||||||||||||
| in `SemOptimizer` for estimating regularized models. | ||||||||||||
|
Comment on lines
+26
to
+27
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
|
|
||||||||||||
| ```julia | ||||||||||||
| SemOptimizerProximal( | ||||||||||||
| SemOptimizer(; | ||||||||||||
| engine = :Proximal, | ||||||||||||
| algorithm = ProximalAlgorithms.PANOC(), | ||||||||||||
| operator_g, | ||||||||||||
| operator_h = nothing | ||||||||||||
| ) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| The proximal operator (aka the regularization function) can be passed as `operator_g`. | ||||||||||||
| The available Algorithms are listed [here](https://juliafirstorder.github.io/ProximalAlgorithms.jl/stable/guide/implemented_algorithms/). | ||||||||||||
| The *proximal operator* (aka the *regularization function*) is passed as `operator_g`, see [available operators](https://juliafirstorder.github.io/ProximalOperators.jl/stable/functions/). | ||||||||||||
| The `algorithm` is chosen from one of the [available algorithms](https://juliafirstorder.github.io/ProximalAlgorithms.jl/stable/guide/implemented_algorithms/). | ||||||||||||
|
|
||||||||||||
| ## First example - lasso | ||||||||||||
|
|
||||||||||||
|
|
@@ -84,7 +84,7 @@ model = Sem( | |||||||||||
| We labeled the covariances between the items because we want to regularize those: | ||||||||||||
|
|
||||||||||||
| ```@example reg | ||||||||||||
| ind = getindex.( | ||||||||||||
| cov_inds = getindex.( | ||||||||||||
| Ref(param_indices(model)), | ||||||||||||
| [:cov_15, :cov_24, :cov_26, :cov_37, :cov_48, :cov_68]) | ||||||||||||
| ``` | ||||||||||||
|
|
@@ -96,30 +96,24 @@ The lasso penalty is defined as | |||||||||||
| \sum \lambda_i \lvert \theta_i \rvert | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| From the previously linked [documentation](https://juliafirstorder.github.io/ProximalOperators.jl/stable/functions/#ProximalOperators.NormL1), we find that lasso regularization is named `NormL1` in the `ProximalOperators` package, and that we can pass an array of hyperparameters (`λ`) to control the amount of regularization for each parameter. To regularize only the observed item covariances, we define `λ` as | ||||||||||||
| In `ProximalOperators.jl`, lasso regularization is represented by the [`NormL1`](https://juliafirstorder.github.io/ProximalOperators.jl/stable/functions/#ProximalOperators.NormL1) operator. It allows controlling the amount of | ||||||||||||
| regularization individually for each SEM model parameter via the vector of hyperparameters (`λ`). | ||||||||||||
| To regularize only the observed item covariances, we define `λ` as | ||||||||||||
|
|
||||||||||||
| ```@example reg | ||||||||||||
| λ = zeros(31); λ[ind] .= 0.02 | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| and use `SemOptimizerProximal`. | ||||||||||||
| λ = zeros(31); λ[cov_inds] .= 0.02 | ||||||||||||
|
|
||||||||||||
| ```@example reg | ||||||||||||
| optimizer_lasso = SemOptimizerProximal( | ||||||||||||
| optimizer_lasso = SemOptimizer( | ||||||||||||
| engine = :Proximal, | ||||||||||||
| operator_g = NormL1(λ) | ||||||||||||
| ) | ||||||||||||
|
|
||||||||||||
| model_lasso = Sem( | ||||||||||||
| specification = partable, | ||||||||||||
| data = data | ||||||||||||
| ) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| Let's fit the regularized model | ||||||||||||
|
|
||||||||||||
| ```@example reg | ||||||||||||
|
|
||||||||||||
| fit_lasso = fit(optimizer_lasso, model_lasso) | ||||||||||||
| fit_lasso = fit(optimizer_lasso, model) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| and compare the solution to unregularizted estimates: | ||||||||||||
|
|
@@ -134,34 +128,31 @@ update_partable!(partable, :estimate_lasso, fit_lasso, solution(fit_lasso)) | |||||||||||
| details(partable) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| Instead of explicitely defining a `SemOptimizerProximal` object, you can also pass `engine = :Proximal` and additional keyword arguments to `fit`: | ||||||||||||
| Instead of explicitly defining a `SemOptimizer` object, you can also pass `engine = :Proximal` | ||||||||||||
| and additional keyword arguments directly to the `fit` function: | ||||||||||||
|
|
||||||||||||
| ```@example reg | ||||||||||||
| sem_fit = fit(model; engine = :Proximal, operator_g = NormL1(λ)) | ||||||||||||
| fit_lasso2 = fit(model; engine = :Proximal, operator_g = NormL1(λ)) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| ## Second example - mixed l1 and l0 regularization | ||||||||||||
|
|
||||||||||||
| You can choose to penalize different parameters with different types of regularization functions. | ||||||||||||
| Let's use the lasso again on the covariances, but additionally penalyze the error variances of the observed items via l0 regularization. | ||||||||||||
| Let's use the *lasso* (*l1*) again on the covariances, but additionally penalize the error variances of the observed items via *l0* regularization. | ||||||||||||
|
|
||||||||||||
| The l0 penalty is defined as | ||||||||||||
| The *l0* penalty is defined as | ||||||||||||
| ```math | ||||||||||||
| \lambda \mathrm{nnz}(\theta) | ||||||||||||
| l_0 = \lambda \mathrm{nnz}(\theta) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| To define a sup of separable proximal operators (i.e. no parameter is penalized twice), | ||||||||||||
| we can use [`SlicedSeparableSum`](https://juliafirstorder.github.io/ProximalOperators.jl/stable/calculus/#ProximalOperators.SlicedSeparableSum) from the `ProximalOperators` package: | ||||||||||||
| Since we apply *l1* and *l0* to the disjoint sets of parameters, this regularization could be represented as | ||||||||||||
| as sum of *separable proximal operators* (i.e. no parameter is penalized twice) | ||||||||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
| implemented by the [`SlicedSeparableSum`](https://juliafirstorder.github.io/ProximalOperators.jl/stable/calculus/#ProximalOperators.SlicedSeparableSum) operator: | ||||||||||||
|
|
||||||||||||
| ```@example reg | ||||||||||||
| prox_operator = SlicedSeparableSum((NormL0(20.0), NormL1(0.02), NormL0(0.0)), ([ind], [9:11], [vcat(1:8, 12:25)])) | ||||||||||||
|
|
||||||||||||
| model_mixed = Sem( | ||||||||||||
| specification = partable, | ||||||||||||
| data = data, | ||||||||||||
| ) | ||||||||||||
| l0_and_l1_reg = SlicedSeparableSum((NormL0(20.0), NormL1(0.02), NormL0(0.0)), ([cov_inds], [9:11], [vcat(1:8, 12:25)])) | ||||||||||||
|
|
||||||||||||
| fit_mixed = fit(model_mixed; engine = :Proximal, operator_g = prox_operator) | ||||||||||||
| fit_mixed = fit(model; engine = :Proximal, operator_g = l0_and_l1_reg) | ||||||||||||
| ``` | ||||||||||||
|
|
||||||||||||
| Let's again compare the different results: | ||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.