Neural Network
Neural network project covering theory, structure, backpropagation, and coding in Python

Neural Network Theory
Neural Network Structure:
backward pass for w:
backward pass for b:
backward pass for w mathematically:
\[L =\underbrace{ L(\underbrace{ \hat{y} \underbrace{ (z(Xw+b)) }_{\frac{\partial z} {\partial w} } }_{\frac{\partial \hat{y}} {\partial z}} )}_{\frac{\partial L} {\partial \hat{y}}}\] \[\frac{\partial L} {\partial w} = \underbrace{\frac{\partial L} {\partial z}}_{ \frac{\partial L} {\partial \hat{y}} \frac{\partial \hat{y}} {\partial z} } \frac{\partial z} {\partial w}\] as a Summary:
\[L = \underbrace{L\left(\hat{y}\right)}_{\frac{\partial L}{\partial \hat{y}}} = \underbrace{L\left(\hat{y}\left(z\right)\right)}_{\frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z}} = \underbrace{L\left(\hat{y}\left(z(XW + b)\right)\right)}_{\frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial W}}\]
Or more explicitly for the chain rule:
\[\frac{\partial L}{\partial W} = \frac{\partial L}{\partial \hat{y}} \times \frac{\partial \hat{y}}{\partial z} \times \frac{\partial z}{\partial W}\]
\(z = XW + b\)
\(\hat{y} = f(z)\) (activation function, e.g. softmax)
\(L = loss(\hat{y}, y)\)
The derivatives \(\frac{\partial L}{\partial \hat{y}}\), \(\frac{\partial \hat{y}}{\partial z}\), and \(\frac{\partial z}{\partial W}\) represent the gradient chain from output all the way down to weights.
Coding the Neural Network
Neural Network Structure:
\[\frac{\partial L}{\partial w} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z} \cdot \frac{\partial z}{\partial w}\]
where:
Loss derivative: \(\frac{\partial L}{\partial \hat{y}}\)
Activation derivative: \(\frac{\partial \hat{y}}{\partial z}\)
Linear model derivative: \(\frac{\partial z}{\partial w}\)
1. Loss derivative \(\left(\frac{\partial L}{\partial \hat{y}}\right)\):
This is computed in:
def backward(self) -> np.ndarray: # CrossEntropy
grad = -self.target / self.prediction / self.target.shape[0]
return grad
self.prediction= \(\hat{y}\) (output of softmax)This gives the gradient from the loss with respect to the softmax output \(\hat{y}\).
Activation derivative \(\left(\frac{\partial \hat{y}}{\partial z}\right)\):
If you have a Softmax layer:
def backward(self, up_grad: np.ndarray) -> np.ndarray: # Softmax
...
down_grad[i] = np.dot(jacobian, up_grad[i])
up_gradis \(\frac{\partial L}{\partial \hat{y}}\) from the loss.The softmax Jacobian gives \(\frac{\partial \hat{y}}{\partial z}\).
Output
down_gradis \(\frac{\partial L}{\partial z}\).This matches step 2 of the chain rule.
Linear model derivative \(\left(\frac{\partial z}{\partial w}\right)\):
In your Linear layer:
def backward(self, up_grad: np.ndarray) -> np.ndarray: # Linear
self.dw = np.dot(self.inp.T, up_grad) # ∂L/∂w
self.db = np.sum(up_grad, axis=0, keepdims=True) # ∂L/∂b
down_grad = np.dot(up_grad, self.w.T) # ∂L/∂input
return down_grad
up_gradis \(\frac{\partial L}{\partial z}\) from the activation.Multiplying with
inp.Tapplies \(\frac{\partial z}{\partial w}\) to get \(\frac{\partial L}{\partial w}\).This is step 3 of the professor’s chain rule.
Full Chain in Code:
Loss backward \(\rightarrow\)
CrossEntropy.backward()\[\frac{\partial L}{\partial \hat{y}}\]Activation backward \(\rightarrow\)
Softmax.backward()\[\frac{\partial L}{\partial z} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial z}\]Linear backward \(\rightarrow\)
Linear.backward()\[\frac{\partial L}{\partial w} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial w}\]
Linear :
Abstract Python Code for Layer:
class Layer:
def __init__(self):
self.inp = None
self.out = None
def __call__(self, inp: np.ndarray) -> np.ndarray:
return self.forward(inp)
def forward(self, inp: np.ndarray) -> np.ndarray:
raise NotImplementedError
def backward(self, up_grad: np.ndarray) -> np.ndarray:
raise NotImplementedError
def step(self, lr: float) -> None:
pass
Input features: \[a\]
Features weights: \[a\]
Bias term: \[a\]
Activation function : \[a\]
Output of the neuron: \[y\]