Deep learning is one of the most successful recent techniques in computer vision and automated data processing in general. The basic idea of supervised machine learning is to define a parameterized function, called a network, and optimize the parameters in such a way that the resulting function maps given inputs x to desired outputs y on a training set of pairs (x,y) -- a process referred to as training the network. The word deep learning describes the design of network architectures as a deeply nested function of simple building blocks. The ultimate goal of machine learning is to approximate the true underlying but unknown relation between the input and the output, such that the trained network is able to make good predictions even on examples outside of the training set -- an aspect referred to as generalization.
The quickly evolving (if not exploding) field of deep learning has led to amazing applications. Here is a fascinating example of combining computer graphics, audio, and video processing with deep learning: https://www.youtube.com/watch?v=9Yq67CjDqvw
This lecture will give an introduction to deep learning, describe common building blocks in the network architectures, introduce optimization algorithms for their training, and discuss strategies that improve the generalization. In particular, we will cover the following topics:
- Supervised machine learning as an interpolation problem
- Simple network architectures: Fully connected layers, rectified linear units, sigmoids, softmax
- Gradient descent for nested functions: The chain rule and it's implementation via backpropagation
- Stochastic gradient descent on large data sets, acceleration via momentum and ADAM
- Capacity, overfitting and underfitting of neural networks
- Training, testing, and validation data sets
- Improving generalization: data augmentation, dropout, early stopping
- Working with images: Convolutions and pooling layers. Computing derivatives and adjoint linear operators
- Getting the network to train: Data preprocessing, weight initialization schemes, and batch normalization
- Applications and state-of-the-art architectures for image classification, segmentation, and denoising
- Architecture designs: Encoder-decoder idea, unrolled algorithms, skip connections + residual learning, recurrent neural networks
- Implementations in NumPy and PyTorch: Hands-on practical experience by implementing gradient descent on a fully connected network in NumPy. Introduction to the deep learning framework PyTorch for training complex models on GPUs.
Besides the lecture notes, the relevant literature for this course includes:
- Coursera course "Machine Learning" by Andrew Ng
Further references to recent literature will be given in the lecture.
The discussion on the number of kinks (respectively number of linear regions) in network architectures can be found here https://arxiv.org/pdf/1402.1869.pdf