fix grammar in issue ml4a#42

KushGabani · Nov 25, 2021 · fc50337 · fc50337
1 parent 555d3f6
commit fc50337
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/_book/how_neural_networks_are_trained.md b/_book/how_neural_networks_are_trained.md
@@ -167,7 +167,7 @@ So how do we actually calculate where that point at the bottom is exactly? There
 
 ## The curse of nonlinearity
 
-Alas, ordinary least squares cannot be used to optimize neural networks however, and so solving the above linear regression will be left as an exercise left to the reader. The reason we cannot use linear regression is that neural networks are nonlinear; Recall the essential difference between the linear equations we posed and a neural network is the presence of the activation function (e.g. sigmoid, tanh, ReLU, or others). Thus, whereas the linear equation above is simply $$y = b + W^\top X$$, a 1-layer neural network with a sigmoid activation function would be $$f(x) = \sigma (b + W^\top X) $$. 
+Alas, ordinary least squares cannot be used to optimize neural networks however, and so solving the above linear regression will be left as an exercise to the reader. The reason we cannot use linear regression is that neural networks are nonlinear; Recall that the essential difference between the linear equations we posed and a neural network is the presence of the activation function (e.g. sigmoid, tanh, ReLU, or others). Thus, whereas the linear equation above is simply $$y = b + W^\top X$$, a 1-layer neural network with a sigmoid activation function would be $$f(x) = \sigma (b + W^\top X) $$. 
 
 This nonlinearity means that the parameters do not act independently of each other in influencing the shape of the loss function. Rather than having a bowl shape, the loss function of a neural network is more complicated. It is bumpy and full of hills and troughs. The property of being "bowl-shaped" is called [convexity](https://en.wikipedia.org/wiki/Convex_function), and it is a highly prized convenience in multi-parameter optimization. A convex loss function ensures we have a global minimum (the bottom of the bowl), and that all roads downhill lead to it.