11.2 Convexity

In deep learning, the optimization problems we are dealing with are generally nonconvex. However, the analysis of convex problems can provide useful insight into how to approach optimization.

An optimization problem is convex if the objective function is a convex function, the inequality constraints are convex sets and the equality constraints are affine.

For any a, b ϵ C, if all the points on the line segment connecting a and b are all within set C, then we say set C is a convex vector space.

Jensen's Inequality
The expectation of a convex function is no less than the convex function of an expectation.
(Here α_i ≥ 0 and Σ_i α_i = 1.)

A function A is affine if there is a linear function L and a vector b such that
A(x) = L(x) + b for all x.
Basically, an affine function is just a linear function plus a translation.

The local minima of convex functions are also the global minima.
Any below set of a convex function is a convex set.
- Below Set : the vector space of x for which f(x) ≤ c.
_{* Image Credit : http://agbs.kyb.tuebingen.mpg.de/lwk/}
A twice-differentiable function is convex if and only if its Hessian (a matrix of second derivatives) is positive semidefinite.

In general, solving a constrained optimization problem is difficult. But we can re-write the problem in the equivalent Lagrangian form to make the optimization process simplier to deal with.
To do so, we add the constraint conditions ( g_i(x)≤0 ) to the objective function as the penalty terms ( λ_i g_i(x) ).
In §4.5, we've used this trick when introducing the L1 & L2 regularization methods.
- We add the L1 norm ( λ/2 |w|_1 ) or the L2 norm ( λ/2 ||w||_2 ) of the network weight parameters to the raw cost function, and hoping that we can find a workable model subjected to smaller weights.
- From the constrained optimization point of view we can see that this will ensure that
  |w|_1 < r (for L1 regularization), and
  ||w||_2 < r**2 (for L2 regularization)
  for some r (see the cyan region in the figure below)
- Adjusting the value of regularization parameter λ affects the constrained region of the problem (larger λ corresponds to smaller r).

Provide feedback