Skip to content

Commit

Permalink
fix grammar in issue ml4a#42
Browse files Browse the repository at this point in the history
  • Loading branch information
KushGabani authored Nov 25, 2021
1 parent 555d3f6 commit fc50337
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _book/how_neural_networks_are_trained.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ So how do we actually calculate where that point at the bottom is exactly? There

## The curse of nonlinearity

Alas, ordinary least squares cannot be used to optimize neural networks however, and so solving the above linear regression will be left as an exercise left to the reader. The reason we cannot use linear regression is that neural networks are nonlinear; Recall the essential difference between the linear equations we posed and a neural network is the presence of the activation function (e.g. sigmoid, tanh, ReLU, or others). Thus, whereas the linear equation above is simply $$y = b + W^\top X$$, a 1-layer neural network with a sigmoid activation function would be $$f(x) = \sigma (b + W^\top X) $$.
Alas, ordinary least squares cannot be used to optimize neural networks however, and so solving the above linear regression will be left as an exercise to the reader. The reason we cannot use linear regression is that neural networks are nonlinear; Recall that the essential difference between the linear equations we posed and a neural network is the presence of the activation function (e.g. sigmoid, tanh, ReLU, or others). Thus, whereas the linear equation above is simply $$y = b + W^\top X$$, a 1-layer neural network with a sigmoid activation function would be $$f(x) = \sigma (b + W^\top X) $$.

This nonlinearity means that the parameters do not act independently of each other in influencing the shape of the loss function. Rather than having a bowl shape, the loss function of a neural network is more complicated. It is bumpy and full of hills and troughs. The property of being "bowl-shaped" is called [convexity](https://en.wikipedia.org/wiki/Convex_function), and it is a highly prized convenience in multi-parameter optimization. A convex loss function ensures we have a global minimum (the bottom of the bowl), and that all roads downhill lead to it.

Expand Down

0 comments on commit fc50337

Please sign in to comment.