diff --git a/HISTORY.md b/HISTORY.md index 6ba087b76..6e979a4a4 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -9,6 +9,9 @@ * Fix CNE test tolerances ([#360](https://github.com/mlpack/ensmallen/pull/360)). + * Rename `SCD` optimizer, to `CD` + ([#379](https://github.com/mlpack/ensmallen/pull/379)). + ### ensmallen 2.19.1: "Eight Ball Deluxe" ###### 2023-01-30 * Avoid deprecation warnings in Armadillo 11.2+ diff --git a/doc/function_types.md b/doc/function_types.md index c6dbffa4f..418ac3501 100644 --- a/doc/function_types.md +++ b/doc/function_types.md @@ -307,7 +307,7 @@ regular implementation of the `Gradient()`, so that function may be omitted. If these functions are implemented, the following partially differentiable function optimizers can be used: - - [Stochastic Coordinate Descent](#stochastic-coordinate-descent-scd) + - [Coordinate Descent](#coordinate-descent-cd) ## Arbitrary separable functions diff --git a/doc/optimizers.md b/doc/optimizers.md index 2acb5a6ed..bb8a25073 100644 --- a/doc/optimizers.md +++ b/doc/optimizers.md @@ -778,6 +778,82 @@ optimizer2.Optimize(f, coordinates); * [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) * [SGD](#standard-sgd) +## Coordinate Descent (CD) + +*An optimizer for [partially differentiable functions](#partially-differentiable-functions).* + +Coordinate descent is a technique for minimizing a function by doing a line +search along a single direction at the current point in the iteration. The +direction (or "coordinate") can be chosen cyclically, randomly or in a greedy +fashion. + +#### Constructors + + * `CD<`_`DescentPolicyType`_`>()` + * `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations`_`)` + * `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval`_`)` + * `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval, descentPolicy`_`)` + +The _`DescentPolicyType`_ template parameter specifies the behavior of CD when +selecting the next coordinate to descend with. The `RandomDescent`, +`GreedyDescent`, and `CyclicDescent` classes are available for use. Custom +behavior can be achieved by implementing a class with the same method +signatures. + +For convenience, the following typedefs have been defined: + + * `RandomCD` (equivalent to `CD`): selects coordinates randomly + * `GreedyCD` (equivalent to `CD`): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule + * `CyclicCD` (equivalent to `CD`): selects coordinates sequentially + +***Note***: `CD` used to be called `SCD`. Use of the name `SCD` is deprecated, +and will be removed in ensmallen 3 and later. + +#### Attributes + +| **type** | **name** | **description** | **default** | +|----------|----------|-----------------|-------------| +| `double` | **`stepSize`** | Step size for each iteration. | `0.01` | +| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` | +| `double` | **`tolerance`** | Maximum absolute tolerance to terminate the algorithm. | `1e-5` | +| `size_t` | **`updateInterval`** | The interval at which the objective is to be reported and checked for convergence. | `1e3` | +| `DescentPolicyType` | **`descentPolicy`** | The policy to use for selecting the coordinate to descend on. | `DescentPolicyType()` | + +Attributes of the optimizer may also be modified via the member methods +`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdateInterval()`, and +`DescentPolicy()`. + +Note that the default value for `descentPolicy` is the default constructor for +_`DescentPolicyType`_. + +#### Examples + +
+Click to collapse/expand example code. + + +```c++ +SparseTestFunction f; +arma::mat coordinates = f.GetInitialPoint(); + +RandomCD randomscd(0.01, 100000, 1e-5, 1e3); +randomscd.Optimize(f, coordinates); + +GreedyCD greedyscd(0.01, 100000, 1e-5, 1e3); +greedyscd.Optimize(f, coordinates); + +CyclicCD cyclicscd(0.01, 100000, 1e-5, 1e3); +cyclicscd.Optimize(f, coordinates); +``` + +
+ +#### See also: + + * [Coordinate descent on Wikipedia](https://en.wikipedia.org/wiki/Coordinate_descent) + * [Stochastic Methods for L1-Regularized Loss Minimization](https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf) + * [Partially differentiable functions](#partially-differentiable-functions) + ## CMAES *An optimizer for [separable functions](#separable-functions).* @@ -2807,79 +2883,6 @@ optimizer.Optimize(f, coordinates); * [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) * [Differentiable separable functions](#differentiable-separable-functions) -## Stochastic Coordinate Descent (SCD) - -*An optimizer for [partially differentiable functions](#partially-differentiable-functions).* - -Stochastic Coordinate descent is a technique for minimizing a function by -doing a line search along a single direction at the current point in the -iteration. The direction (or "coordinate") can be chosen cyclically, randomly -or in a greedy fashion. - -#### Constructors - - * `SCD<`_`DescentPolicyType`_`>()` - * `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations`_`)` - * `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval`_`)` - * `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval, descentPolicy`_`)` - -The _`DescentPolicyType`_ template parameter specifies the behavior of SCD when -selecting the next coordinate to descend with. The `RandomDescent`, -`GreedyDescent`, and `CyclicDescent` classes are available for use. Custom -behavior can be achieved by implementing a class with the same method -signatures. - -For convenience, the following typedefs have been defined: - - * `RandomSCD` (equivalent to `SCD`): selects coordinates randomly - * `GreedySCD` (equivalent to `SCD`): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule - * `CyclicSCD` (equivalent to `SCD`): selects coordinates sequentially - -#### Attributes - -| **type** | **name** | **description** | **default** | -|----------|----------|-----------------|-------------| -| `double` | **`stepSize`** | Step size for each iteration. | `0.01` | -| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` | -| `double` | **`tolerance`** | Maximum absolute tolerance to terminate the algorithm. | `1e-5` | -| `size_t` | **`updateInterval`** | The interval at which the objective is to be reported and checked for convergence. | `1e3` | -| `DescentPolicyType` | **`descentPolicy`** | The policy to use for selecting the coordinate to descend on. | `DescentPolicyType()` | - -Attributes of the optimizer may also be modified via the member methods -`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdateInterval()`, and -`DescentPolicy()`. - -Note that the default value for `descentPolicy` is the default constructor for -_`DescentPolicyType`_. - -#### Examples - -
-Click to collapse/expand example code. - - -```c++ -SparseTestFunction f; -arma::mat coordinates = f.GetInitialPoint(); - -RandomSCD randomscd(0.01, 100000, 1e-5, 1e3); -randomscd.Optimize(f, coordinates); - -GreedySCD greedyscd(0.01, 100000, 1e-5, 1e3); -greedyscd.Optimize(f, coordinates); - -CyclicSCD cyclicscd(0.01, 100000, 1e-5, 1e3); -cyclicscd.Optimize(f, coordinates); -``` - -
- -#### See also: - - * [Coordinate descent on Wikipedia](https://en.wikipedia.org/wiki/Coordinate_descent) - * [Stochastic Methods for L1-Regularized Loss Minimization](https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf) - * [Partially differentiable functions](#partially-differentiable-functions) - ## Stochastic Gradient Descent with Restarts (SGDR) *An optimizer for [differentiable separable diff --git a/include/ensmallen.hpp b/include/ensmallen.hpp index be91cc23f..bacc21026 100644 --- a/include/ensmallen.hpp +++ b/include/ensmallen.hpp @@ -98,6 +98,7 @@ #include "ensmallen_bits/bigbatch_sgd/bigbatch_sgd.hpp" #include "ensmallen_bits/cmaes/cmaes.hpp" #include "ensmallen_bits/cmaes/active_cmaes.hpp" +#include "ensmallen_bits/cd/cd.hpp" #include "ensmallen_bits/cne/cne.hpp" #include "ensmallen_bits/de/de.hpp" #include "ensmallen_bits/eve/eve.hpp" @@ -119,7 +120,6 @@ #include "ensmallen_bits/sa/sa.hpp" #include "ensmallen_bits/sarah/sarah.hpp" -#include "ensmallen_bits/scd/scd.hpp" #include "ensmallen_bits/sdp/sdp.hpp" #include "ensmallen_bits/sdp/lrsdp.hpp" #include "ensmallen_bits/sdp/primal_dual.hpp" diff --git a/include/ensmallen_bits/scd/scd.hpp b/include/ensmallen_bits/cd/cd.hpp similarity index 84% rename from include/ensmallen_bits/scd/scd.hpp rename to include/ensmallen_bits/cd/cd.hpp index 1d0562d8e..062a210bc 100644 --- a/include/ensmallen_bits/scd/scd.hpp +++ b/include/ensmallen_bits/cd/cd.hpp @@ -1,16 +1,16 @@ /** - * @file scd.hpp + * @file cd.hpp * @author Shikhar Bhardwaj * - * Stochastic Coordinate Descent (SCD). + * Coordinate Descent (CD). * * ensmallen is free software; you may redistribute it and/or modify it under * the terms of the 3-clause BSD license. You should have received a copy of * the 3-clause BSD license along with ensmallen. If not, see * http://www.opensource.org/licenses/BSD-3-Clause for more information. */ -#ifndef ENSMALLEN_SCD_SCD_HPP -#define ENSMALLEN_SCD_SCD_HPP +#ifndef ENSMALLEN_CD_CD_HPP +#define ENSMALLEN_CD_CD_HPP #include "descent_policies/cyclic_descent.hpp" #include "descent_policies/random_descent.hpp" @@ -42,7 +42,7 @@ namespace ens { * } * @endcode * - * SCD can optimize partially differentiable functions. For more details, see + * CD can optimize partially differentiable functions. For more details, see * the documentation on function types included with this distribution or on the * ensmallen website. * @@ -50,11 +50,11 @@ namespace ens { * coordinate for descent is selected. */ template -class SCD +class CD { public: /** - * Construct the SCD optimizer with the given function and parameters. The + * Construct the CD optimizer with the given function and parameters. The * default value here are not necessarily good for every problem, so it is * suggested that the values used are tailored for the task at hand. The * maximum number of iterations refers to the maximum number of "descents" @@ -70,11 +70,11 @@ class SCD * @param descentPolicy The policy to use for picking up the coordinate to * descend on. */ - SCD(const double stepSize = 0.01, - const size_t maxIterations = 100000, - const double tolerance = 1e-5, - const size_t updateInterval = 1e3, - const DescentPolicyType descentPolicy = DescentPolicyType()); + CD(const double stepSize = 0.01, + const size_t maxIterations = 100000, + const double tolerance = 1e-5, + const size_t updateInterval = 1e3, + const DescentPolicyType descentPolicy = DescentPolicyType()); /** * Optimize the given function using stochastic coordinate descent. The @@ -158,6 +158,24 @@ class SCD } // namespace ens // Include implementation. -#include "scd_impl.hpp" +#include "cd_impl.hpp" + +namespace ens { + +/** + * Backwards-compatibility alias; this can be removed after ensmallen 3.10.0. + * The history here is that CD was originally named SCD, but that is an + * inaccurate name because this is not a stochastic technique; thus, it was + * renamed SCD. + */ +template +using SCD = CD; + +// Convenience typedefs. +using RandomCD = CD; +using GreedyCD = CD; +using CyclicCD = CD; + +} // namespace ens #endif diff --git a/include/ensmallen_bits/scd/scd_impl.hpp b/include/ensmallen_bits/cd/cd_impl.hpp similarity index 88% rename from include/ensmallen_bits/scd/scd_impl.hpp rename to include/ensmallen_bits/cd/cd_impl.hpp index eec0672af..f7d860a54 100644 --- a/include/ensmallen_bits/scd/scd_impl.hpp +++ b/include/ensmallen_bits/cd/cd_impl.hpp @@ -1,26 +1,26 @@ /** - * @file scd_impl.hpp + * @file cd_impl.hpp * @author Shikhar Bhardwaj * - * Implementation of stochastic coordinate descent. + * Implementation of coordinate descent. * * ensmallen is free software; you may redistribute it and/or modify it under * the terms of the 3-clause BSD license. You should have received a copy of * the 3-clause BSD license along with ensmallen. If not, see * http://www.opensource.org/licenses/BSD-3-Clause for more information. */ -#ifndef ENSMALLEN_SCD_SCD_IMPL_HPP -#define ENSMALLEN_SCD_SCD_IMPL_HPP +#ifndef ENSMALLEN_CD_CD_IMPL_HPP +#define ENSMALLEN_CD_CD_IMPL_HPP // In case it hasn't been included yet. -#include "scd.hpp" +#include "cd.hpp" #include namespace ens { template -SCD::SCD( +CD::CD( const double stepSize, const size_t maxIterations, const double tolerance, @@ -41,7 +41,7 @@ template typename std::enable_if::value, typename MatType::elem_type>::type -SCD::Optimize( +CD::Optimize( ResolvableFunctionType& function, MatType& iterateIn, CallbackTypes&&... callbacks) @@ -94,12 +94,12 @@ SCD::Optimize( overallObjective, callbacks...); // Output current objective function. - Info << "SCD: iteration " << i << ", objective " << overallObjective + Info << "CD: iteration " << i << ", objective " << overallObjective << "." << std::endl; if (std::isnan(overallObjective) || std::isinf(overallObjective)) { - Warn << "SCD: converged to " << overallObjective << "; terminating" + Warn << "CD: converged to " << overallObjective << "; terminating" << " with failure. Try a smaller step size?" << std::endl; Callback::EndOptimization(*this, function, iterate, callbacks...); @@ -108,7 +108,7 @@ SCD::Optimize( if (std::abs(lastObjective - overallObjective) < tolerance) { - Info << "SCD: minimized within tolerance " << tolerance << "; " + Info << "CD: minimized within tolerance " << tolerance << "; " << "terminating optimization." << std::endl; Callback::EndOptimization(*this, function, iterate, callbacks...); @@ -119,7 +119,7 @@ SCD::Optimize( } } - Info << "SCD: maximum iterations (" << maxIterations << ") reached; " + Info << "CD: maximum iterations (" << maxIterations << ") reached; " << "terminating optimization." << std::endl; // Calculate and return final objective. diff --git a/include/ensmallen_bits/scd/descent_policies/cyclic_descent.hpp b/include/ensmallen_bits/cd/descent_policies/cyclic_descent.hpp similarity index 100% rename from include/ensmallen_bits/scd/descent_policies/cyclic_descent.hpp rename to include/ensmallen_bits/cd/descent_policies/cyclic_descent.hpp diff --git a/include/ensmallen_bits/scd/descent_policies/greedy_descent.hpp b/include/ensmallen_bits/cd/descent_policies/greedy_descent.hpp similarity index 100% rename from include/ensmallen_bits/scd/descent_policies/greedy_descent.hpp rename to include/ensmallen_bits/cd/descent_policies/greedy_descent.hpp diff --git a/include/ensmallen_bits/scd/descent_policies/random_descent.hpp b/include/ensmallen_bits/cd/descent_policies/random_descent.hpp similarity index 100% rename from include/ensmallen_bits/scd/descent_policies/random_descent.hpp rename to include/ensmallen_bits/cd/descent_policies/random_descent.hpp diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt index 8ce3123fa..df5938844 100644 --- a/tests/CMakeLists.txt +++ b/tests/CMakeLists.txt @@ -11,6 +11,7 @@ set(ENSMALLEN_TESTS_SOURCES aug_lagrangian_test.cpp bigbatch_sgd_test.cpp callbacks_test.cpp + cd_test.cpp cmaes_test.cpp cne_test.cpp de_test.cpp @@ -39,7 +40,6 @@ set(ENSMALLEN_TESTS_SOURCES rmsprop_test.cpp sa_test.cpp sarah_test.cpp - scd_test.cpp sdp_primal_dual_test.cpp sgdr_test.cpp sgd_test.cpp diff --git a/tests/scd_test.cpp b/tests/cd_test.cpp similarity index 82% rename from tests/scd_test.cpp rename to tests/cd_test.cpp index fa415942e..1e83a47d0 100644 --- a/tests/scd_test.cpp +++ b/tests/cd_test.cpp @@ -20,17 +20,17 @@ using namespace ens; using namespace ens::test; /** - * Test the correctness of the SCD implementation by using a dataset with a + * Test the correctness of the CD implementation by using a dataset with a * precalculated minima. */ -TEST_CASE("PreCalcSCDTest", "[SCDTest]") +TEST_CASE("PreCalcCDTest", "[CDTest]") { arma::mat predictors("0 0 0.4; 0 0 0.6; 0 0.3 0; 0.2 0 0; 0.2 -0.5 0;"); arma::Row responses("1 1 0;"); LogisticRegressionFunction f(predictors, responses, 0.0001); - SCD<> s(0.02, 60000, 1e-5); + CD<> s(0.02, 60000, 1e-5); arma::mat iterate = f.InitialPoint(); double objective = s.Optimize(f, iterate); @@ -39,47 +39,47 @@ TEST_CASE("PreCalcSCDTest", "[SCDTest]") } /** - * Test the correctness of the SCD implemenation by using the sparse test + * Test the correctness of the CD implemenation by using the sparse test * function, with disjoint features which optimize to a precalculated minima. */ -TEST_CASE("DisjointFeatureTest", "[SCDTest]") +TEST_CASE("DisjointFeatureTest", "[CDTest]") { - // The test function for parallel SGD should work with SCD, as the gradients + // The test function for parallel SGD should work with CD, as the gradients // of the individual functions are projections into the ith dimension. - SCD<> s(0.4); + CD<> s(0.4); FunctionTest(s, 0.01, 0.001); } /** - * Test the correctness of the SCD implemenation by using the sparse test + * Test the correctness of the CD implemenation by using the sparse test * function, with disjoint features which optimize to a precalculated minima. * Use arma::fmat. */ -TEST_CASE("DisjointFeatureFMatTest", "[SCDTest]") +TEST_CASE("DisjointFeatureFMatTest", "[CDTest]") { - // The test function for parallel SGD should work with SCD, as the gradients + // The test function for parallel SGD should work with CD, as the gradients // of the individual functions are projections into the ith dimension. - SCD<> s(0.4); + CD<> s(0.4); FunctionTest(s, 0.2, 0.02); } /** - * Test the correctness of the SCD implemenation by using the sparse test + * Test the correctness of the CD implemenation by using the sparse test * function, with disjoint features which optimize to a precalculated minima. * Use arma::sp_mat. */ -TEST_CASE("DisjointFeatureSpMatTest", "[SCDTest]") +TEST_CASE("DisjointFeatureSpMatTest", "[CDTest]") { - // The test function for parallel SGD should work with SCD, as the gradients + // The test function for parallel SGD should work with CD, as the gradients // of the individual functions are projections into the ith dimension. - SCD<> s(0.4); + CD<> s(0.4); FunctionTest(s, 0.01, 0.001); } /** * Test the greedy descent policy. */ -TEST_CASE("GreedyDescentTest", "[SCDTest]") +TEST_CASE("GreedyDescentTest", "[CDTest]") { // In the sparse test function, the given point has the maximum gradient at // the feature with index 2. @@ -105,7 +105,7 @@ TEST_CASE("GreedyDescentTest", "[SCDTest]") /** * Test the cyclic descent policy. */ -TEST_CASE("CyclicDescentTest", "[SCDTest]") +TEST_CASE("CyclicDescentTest", "[CDTest]") { const size_t features = 10; struct DummyFunction @@ -130,7 +130,7 @@ TEST_CASE("CyclicDescentTest", "[SCDTest]") /** * Test the random descent policy. */ -TEST_CASE("RandomDescentTest", "[SCDTest]") +TEST_CASE("RandomDescentTest", "[CDTest]") { const size_t features = 10; struct DummyFunction @@ -158,7 +158,7 @@ TEST_CASE("RandomDescentTest", "[SCDTest]") /** * Test that LogisticRegressionFunction::PartialGradient() works as expected. */ -TEST_CASE("LogisticRegressionFunctionPartialGradientTest", "[SCDTest]") +TEST_CASE("LogisticRegressionFunctionPartialGradientTest", "[CDTest]") { // Evaluate the gradient and feature gradient and equate. arma::mat predictors("0 0 0.4; 0 0 0.6; 0 0.3 0; 0.2 0 0; 0.2 -0.5 0;"); @@ -184,7 +184,7 @@ TEST_CASE("LogisticRegressionFunctionPartialGradientTest", "[SCDTest]") /** * Test that SoftmaxRegressionFunction::PartialGradient() works as expected. */ -TEST_CASE("SoftmaxRegressionFunctionPartialGradientTest", "[SCDTest]") +TEST_CASE("SoftmaxRegressionFunctionPartialGradientTest", "[CDTest]") { const size_t points = 1000; const size_t inputSize = 10;