When $i = k$,
$$ \begin{aligned} \partial y_i / \partial x_k =& \frac{\exp(x_i)[\sum_j \exp(x_j)] - \exp(x_i)\exp(x_i)}{[\sum_j \exp(x_j)]^2} \\ =& \frac{\exp(x_i) \left([\sum_j \exp(x_j)] -\exp(x_i)\right)}{[\sum_j \exp(x_j)]^2} \\ =& \frac{\exp(x_i)}{\sum_j \exp(x_j)} \cdot \frac{[\sum_j \exp(x_j)] - \exp(x_i)}{\sum_j \exp(x_j)} \\ =& y_i \cdot (1 - y_i) \\ =& y_i - y_i y_k \end{aligned} $$
When $i \not= k$,
$$ \begin{aligned} \partial y_i / \partial x_k =& \frac{0 - \exp(x_i)\exp(x_k)}{[\sum_j \exp(x_j)]^2} \\ =& - y_i y_k \end{aligned} $$
As a result, the Jacobian matrix
$$ J_x y = \begin{bmatrix} y_1 - y_1 y_1 & - y_1 y_2 & ... & - y_1 y_n \\ y_2 y_1 & y_2 - y_1 y_2 & ... & - y_1 y_n \\ \vdots & \ddots \\ y_n y_1 & - y_n y_2 & ... & y_n - y_n y_n \end{bmatrix} = \operatorname{diag}(y) - y \times y^T $$
forward(inputs, feedbacks=None)
stable_softmax(inputs, axis)
Since some exponentials are quite large numbers, we can use
$$ y_i(x) = \frac{\exp(x_i)}{\sum_j \exp(x_j)} = \frac{\exp(x_i - m)}{\sum_j \exp(x_j - m)} $$
where $m = \max(x_1, x_2, ...)$.