Skip to content

Commit

Permalink
Merge pull request #379 from rcurtin/scd-to-cd
Browse files Browse the repository at this point in the history
Rename `SCD` to `CD`
  • Loading branch information
coatless authored Sep 29, 2023
2 parents b97c8b0 + 5bc145b commit 42cdf42
Show file tree
Hide file tree
Showing 11 changed files with 144 additions and 120 deletions.
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
* Fix CNE test tolerances
([#360](https://github.com/mlpack/ensmallen/pull/360)).

* Rename `SCD` optimizer, to `CD`
([#379](https://github.com/mlpack/ensmallen/pull/379)).

### ensmallen 2.19.1: "Eight Ball Deluxe"
###### 2023-01-30
* Avoid deprecation warnings in Armadillo 11.2+
Expand Down
2 changes: 1 addition & 1 deletion doc/function_types.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ regular implementation of the `Gradient()`, so that function may be omitted.
If these functions are implemented, the following partially differentiable
function optimizers can be used:

- [Stochastic Coordinate Descent](#stochastic-coordinate-descent-scd)
- [Coordinate Descent](#coordinate-descent-cd)

## Arbitrary separable functions

Expand Down
149 changes: 76 additions & 73 deletions doc/optimizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -778,6 +778,82 @@ optimizer2.Optimize(f, coordinates);
* [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
* [SGD](#standard-sgd)
## Coordinate Descent (CD)
*An optimizer for [partially differentiable functions](#partially-differentiable-functions).*
Coordinate descent is a technique for minimizing a function by doing a line
search along a single direction at the current point in the iteration. The
direction (or "coordinate") can be chosen cyclically, randomly or in a greedy
fashion.
#### Constructors
* `CD<`_`DescentPolicyType`_`>()`
* `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations`_`)`
* `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval`_`)`
* `CD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval, descentPolicy`_`)`
The _`DescentPolicyType`_ template parameter specifies the behavior of CD when
selecting the next coordinate to descend with. The `RandomDescent`,
`GreedyDescent`, and `CyclicDescent` classes are available for use. Custom
behavior can be achieved by implementing a class with the same method
signatures.
For convenience, the following typedefs have been defined:
* `RandomCD` (equivalent to `CD<RandomDescent>`): selects coordinates randomly
* `GreedyCD` (equivalent to `CD<GreedyDescent>`): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule
* `CyclicCD` (equivalent to `CD<CyclicDescent>`): selects coordinates sequentially
***Note***: `CD` used to be called `SCD`. Use of the name `SCD` is deprecated,
and will be removed in ensmallen 3 and later.
#### Attributes
| **type** | **name** | **description** | **default** |
|----------|----------|-----------------|-------------|
| `double` | **`stepSize`** | Step size for each iteration. | `0.01` |
| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
| `double` | **`tolerance`** | Maximum absolute tolerance to terminate the algorithm. | `1e-5` |
| `size_t` | **`updateInterval`** | The interval at which the objective is to be reported and checked for convergence. | `1e3` |
| `DescentPolicyType` | **`descentPolicy`** | The policy to use for selecting the coordinate to descend on. | `DescentPolicyType()` |
Attributes of the optimizer may also be modified via the member methods
`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdateInterval()`, and
`DescentPolicy()`.
Note that the default value for `descentPolicy` is the default constructor for
_`DescentPolicyType`_.
#### Examples
<details open>
<summary>Click to collapse/expand example code.
</summary>
```c++
SparseTestFunction f;
arma::mat coordinates = f.GetInitialPoint();
RandomCD randomscd(0.01, 100000, 1e-5, 1e3);
randomscd.Optimize(f, coordinates);
GreedyCD greedyscd(0.01, 100000, 1e-5, 1e3);
greedyscd.Optimize(f, coordinates);
CyclicCD cyclicscd(0.01, 100000, 1e-5, 1e3);
cyclicscd.Optimize(f, coordinates);
```

</details>

#### See also:

* [Coordinate descent on Wikipedia](https://en.wikipedia.org/wiki/Coordinate_descent)
* [Stochastic Methods for L1-Regularized Loss Minimization](https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf)
* [Partially differentiable functions](#partially-differentiable-functions)

## CMAES

*An optimizer for [separable functions](#separable-functions).*
Expand Down Expand Up @@ -2807,79 +2883,6 @@ optimizer.Optimize(f, coordinates);
* [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
* [Differentiable separable functions](#differentiable-separable-functions)
## Stochastic Coordinate Descent (SCD)

*An optimizer for [partially differentiable functions](#partially-differentiable-functions).*

Stochastic Coordinate descent is a technique for minimizing a function by
doing a line search along a single direction at the current point in the
iteration. The direction (or "coordinate") can be chosen cyclically, randomly
or in a greedy fashion.

#### Constructors

* `SCD<`_`DescentPolicyType`_`>()`
* `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations`_`)`
* `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval`_`)`
* `SCD<`_`DescentPolicyType`_`>(`_`stepSize, maxIterations, tolerance, updateInterval, descentPolicy`_`)`

The _`DescentPolicyType`_ template parameter specifies the behavior of SCD when
selecting the next coordinate to descend with. The `RandomDescent`,
`GreedyDescent`, and `CyclicDescent` classes are available for use. Custom
behavior can be achieved by implementing a class with the same method
signatures.

For convenience, the following typedefs have been defined:

* `RandomSCD` (equivalent to `SCD<RandomDescent>`): selects coordinates randomly
* `GreedySCD` (equivalent to `SCD<GreedyDescent>`): selects the coordinate with the maximum guaranteed descent according to the Gauss-Southwell rule
* `CyclicSCD` (equivalent to `SCD<CyclicDescent>`): selects coordinates sequentially

#### Attributes

| **type** | **name** | **description** | **default** |
|----------|----------|-----------------|-------------|
| `double` | **`stepSize`** | Step size for each iteration. | `0.01` |
| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
| `double` | **`tolerance`** | Maximum absolute tolerance to terminate the algorithm. | `1e-5` |
| `size_t` | **`updateInterval`** | The interval at which the objective is to be reported and checked for convergence. | `1e3` |
| `DescentPolicyType` | **`descentPolicy`** | The policy to use for selecting the coordinate to descend on. | `DescentPolicyType()` |

Attributes of the optimizer may also be modified via the member methods
`StepSize()`, `MaxIterations()`, `Tolerance()`, `UpdateInterval()`, and
`DescentPolicy()`.

Note that the default value for `descentPolicy` is the default constructor for
_`DescentPolicyType`_.

#### Examples

<details open>
<summary>Click to collapse/expand example code.
</summary>

```c++
SparseTestFunction f;
arma::mat coordinates = f.GetInitialPoint();

RandomSCD randomscd(0.01, 100000, 1e-5, 1e3);
randomscd.Optimize(f, coordinates);

GreedySCD greedyscd(0.01, 100000, 1e-5, 1e3);
greedyscd.Optimize(f, coordinates);

CyclicSCD cyclicscd(0.01, 100000, 1e-5, 1e3);
cyclicscd.Optimize(f, coordinates);
```
</details>
#### See also:
* [Coordinate descent on Wikipedia](https://en.wikipedia.org/wiki/Coordinate_descent)
* [Stochastic Methods for L1-Regularized Loss Minimization](https://www.jmlr.org/papers/volume12/shalev-shwartz11a/shalev-shwartz11a.pdf)
* [Partially differentiable functions](#partially-differentiable-functions)
## Stochastic Gradient Descent with Restarts (SGDR)
*An optimizer for [differentiable separable
Expand Down
2 changes: 1 addition & 1 deletion include/ensmallen.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@
#include "ensmallen_bits/bigbatch_sgd/bigbatch_sgd.hpp"
#include "ensmallen_bits/cmaes/cmaes.hpp"
#include "ensmallen_bits/cmaes/active_cmaes.hpp"
#include "ensmallen_bits/cd/cd.hpp"
#include "ensmallen_bits/cne/cne.hpp"
#include "ensmallen_bits/de/de.hpp"
#include "ensmallen_bits/eve/eve.hpp"
Expand All @@ -119,7 +120,6 @@

#include "ensmallen_bits/sa/sa.hpp"
#include "ensmallen_bits/sarah/sarah.hpp"
#include "ensmallen_bits/scd/scd.hpp"
#include "ensmallen_bits/sdp/sdp.hpp"
#include "ensmallen_bits/sdp/lrsdp.hpp"
#include "ensmallen_bits/sdp/primal_dual.hpp"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
/**
* @file scd.hpp
* @file cd.hpp
* @author Shikhar Bhardwaj
*
* Stochastic Coordinate Descent (SCD).
* Coordinate Descent (CD).
*
* ensmallen is free software; you may redistribute it and/or modify it under
* the terms of the 3-clause BSD license. You should have received a copy of
* the 3-clause BSD license along with ensmallen. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#ifndef ENSMALLEN_SCD_SCD_HPP
#define ENSMALLEN_SCD_SCD_HPP
#ifndef ENSMALLEN_CD_CD_HPP
#define ENSMALLEN_CD_CD_HPP

#include "descent_policies/cyclic_descent.hpp"
#include "descent_policies/random_descent.hpp"
Expand Down Expand Up @@ -42,19 +42,19 @@ namespace ens {
* }
* @endcode
*
* SCD can optimize partially differentiable functions. For more details, see
* CD can optimize partially differentiable functions. For more details, see
* the documentation on function types included with this distribution or on the
* ensmallen website.
*
* @tparam DescentPolicy Descent policy to decide the order in which the
* coordinate for descent is selected.
*/
template <typename DescentPolicyType = RandomDescent>
class SCD
class CD
{
public:
/**
* Construct the SCD optimizer with the given function and parameters. The
* Construct the CD optimizer with the given function and parameters. The
* default value here are not necessarily good for every problem, so it is
* suggested that the values used are tailored for the task at hand. The
* maximum number of iterations refers to the maximum number of "descents"
Expand All @@ -70,11 +70,11 @@ class SCD
* @param descentPolicy The policy to use for picking up the coordinate to
* descend on.
*/
SCD(const double stepSize = 0.01,
const size_t maxIterations = 100000,
const double tolerance = 1e-5,
const size_t updateInterval = 1e3,
const DescentPolicyType descentPolicy = DescentPolicyType());
CD(const double stepSize = 0.01,
const size_t maxIterations = 100000,
const double tolerance = 1e-5,
const size_t updateInterval = 1e3,
const DescentPolicyType descentPolicy = DescentPolicyType());

/**
* Optimize the given function using stochastic coordinate descent. The
Expand Down Expand Up @@ -158,6 +158,24 @@ class SCD
} // namespace ens

// Include implementation.
#include "scd_impl.hpp"
#include "cd_impl.hpp"

namespace ens {

/**
* Backwards-compatibility alias; this can be removed after ensmallen 3.10.0.
* The history here is that CD was originally named SCD, but that is an
* inaccurate name because this is not a stochastic technique; thus, it was
* renamed SCD.
*/
template<typename DescentPolicyType = RandomDescent>
using SCD = CD<DescentPolicyType>;

// Convenience typedefs.
using RandomCD = CD<RandomDescent>;
using GreedyCD = CD<GreedyDescent>;
using CyclicCD = CD<CyclicDescent>;

} // namespace ens

#endif
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
/**
* @file scd_impl.hpp
* @file cd_impl.hpp
* @author Shikhar Bhardwaj
*
* Implementation of stochastic coordinate descent.
* Implementation of coordinate descent.
*
* ensmallen is free software; you may redistribute it and/or modify it under
* the terms of the 3-clause BSD license. You should have received a copy of
* the 3-clause BSD license along with ensmallen. If not, see
* http://www.opensource.org/licenses/BSD-3-Clause for more information.
*/
#ifndef ENSMALLEN_SCD_SCD_IMPL_HPP
#define ENSMALLEN_SCD_SCD_IMPL_HPP
#ifndef ENSMALLEN_CD_CD_IMPL_HPP
#define ENSMALLEN_CD_CD_IMPL_HPP

// In case it hasn't been included yet.
#include "scd.hpp"
#include "cd.hpp"

#include <ensmallen_bits/function.hpp>

namespace ens {

template <typename DescentPolicyType>
SCD<DescentPolicyType>::SCD(
CD<DescentPolicyType>::CD(
const double stepSize,
const size_t maxIterations,
const double tolerance,
Expand All @@ -41,7 +41,7 @@ template <typename ResolvableFunctionType,
typename... CallbackTypes>
typename std::enable_if<IsArmaType<GradType>::value,
typename MatType::elem_type>::type
SCD<DescentPolicyType>::Optimize(
CD<DescentPolicyType>::Optimize(
ResolvableFunctionType& function,
MatType& iterateIn,
CallbackTypes&&... callbacks)
Expand Down Expand Up @@ -94,12 +94,12 @@ SCD<DescentPolicyType>::Optimize(
overallObjective, callbacks...);

// Output current objective function.
Info << "SCD: iteration " << i << ", objective " << overallObjective
Info << "CD: iteration " << i << ", objective " << overallObjective
<< "." << std::endl;

if (std::isnan(overallObjective) || std::isinf(overallObjective))
{
Warn << "SCD: converged to " << overallObjective << "; terminating"
Warn << "CD: converged to " << overallObjective << "; terminating"
<< " with failure. Try a smaller step size?" << std::endl;

Callback::EndOptimization(*this, function, iterate, callbacks...);
Expand All @@ -108,7 +108,7 @@ SCD<DescentPolicyType>::Optimize(

if (std::abs(lastObjective - overallObjective) < tolerance)
{
Info << "SCD: minimized within tolerance " << tolerance << "; "
Info << "CD: minimized within tolerance " << tolerance << "; "
<< "terminating optimization." << std::endl;

Callback::EndOptimization(*this, function, iterate, callbacks...);
Expand All @@ -119,7 +119,7 @@ SCD<DescentPolicyType>::Optimize(
}
}

Info << "SCD: maximum iterations (" << maxIterations << ") reached; "
Info << "CD: maximum iterations (" << maxIterations << ") reached; "
<< "terminating optimization." << std::endl;

// Calculate and return final objective.
Expand Down
2 changes: 1 addition & 1 deletion tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ set(ENSMALLEN_TESTS_SOURCES
aug_lagrangian_test.cpp
bigbatch_sgd_test.cpp
callbacks_test.cpp
cd_test.cpp
cmaes_test.cpp
cne_test.cpp
de_test.cpp
Expand Down Expand Up @@ -39,7 +40,6 @@ set(ENSMALLEN_TESTS_SOURCES
rmsprop_test.cpp
sa_test.cpp
sarah_test.cpp
scd_test.cpp
sdp_primal_dual_test.cpp
sgdr_test.cpp
sgd_test.cpp
Expand Down
Loading

0 comments on commit 42cdf42

Please sign in to comment.