Select a network architecture to visualize
0/0
0%
0.0000
0%
The neural network architecture defines how data flows through the network. Different architectures are suitable for different types of data:
A basic architecture where information moves in one direction from input to output through hidden layers. Each neuron in a layer connects to all neurons in the next layer.
Where σ is the activation function, W are weights, b are biases, and a are activations.
Specialized for grid-like data (images). Uses convolutional layers that preserve spatial relationships through local receptive fields and shared weights.
Where I is the input, K is the kernel, and * is the convolution operation.
Uses self-attention mechanisms to process sequential data while handling long-range dependencies better than RNNs.
Where Q, K, V are queries, keys and values respectively, and dk is the dimension of keys.
The forward pass computes the output of the network for a given input by propagating data through each layer:
This computation is repeated for each layer until the final output is produced.
Non-linear functions that introduce needed complexity to the model:
Backpropagation efficiently computes gradients of the loss function with respect to each parameter by applying the chain rule:
Where J is the loss function, ⊙ is element-wise multiplication, and σ' is the derivative of the activation function.
Backpropagation works by traversing the computation graph backward from the loss:
Optimizers adjust network parameters to minimize the loss function:
Where θ are the parameters and η is the learning rate.
Adaptive Moment Estimation combines ideas from RMSprop and momentum:
Where m and v are estimates of first and second moments of gradients, and hats indicate bias-corrected versions.
Techniques to adapt learning rate during training:
Made with DeepSite