Deep Learning

Deep Learning

Feb 24, 2024 · 1 min read

All the different classes can be found here!

This is the CC66204 course from the Universidad de Chile. Based on the class of Ivan Sipiran, I added details in each of the classes, allowing to understand on how we get to the recent large multimodal models. The github is here.
Here’s a summary:

  1. General introduction: Overview of the class, reminders from Machine Learning,…
  2. [TODO] Basics: Perceptron, Vanilla Gradient Descent, MLP, Backprop,
  3. [TODO] Losses and Activations: General losses, Softmax, CE, Activation functions
  4. [TODO] Initialization and Optimization: Weights initialization, Complex gradient descents
  5. Regularization: Penalization, Dropout, Data augmentation
  6. [TODO] Convolutional Layer: Convolution, Padding, Pooling, LeNet
  7. Computer Vision Architectures: ImageNet, Revolution of depth, Classical classifiers architectures
  8. Transfer Learning: Motivation, Principle, Types of TL, Weights unfreezing, Pre-training datasets, SoTA
  9. Object Detection: Principle, IoU and mAP, Classical Object Detection, Segmentation and Mask-RCNN, SoTA
  10. [TODO] Recurrent Layer: Sequential Modeling, RNN, LSTM, GRU,…
  11. [TODO] Attention:
  12. [TODO] Transformers:
  13. Generative Large Language Models: Language Modeling and Temperature, Abilities and In-Context-Learning, Tokenization, Instructions, Alignments, Reasonings, Training and Evaluating in Practice, LLMs as Agents
  14. Large Multimodal Models: Multimodality, Fusion, Original tasks and datasets, Early multimodal transformers, CLIP and text2image Diffusion, Frozen encoders, BLIP 1/2/3 and LMM Assistants, Open-source training datasets, LMM evaluation, Video, Multimodal Tokenization