AI003 Professional

Introduction to Deep Learning

Deep learning is a sub-field of machine learning that focuses on learning complex, hierarchical feature representations from raw data using artificial neural networks. The course covers fundamental principles, underlying mathematics, optimization concepts (gradient descent, backpropagation), network modules (linear, convolution, pooling layers), and common architectures (CNNs, RNNs). Applications demonstrated include computer vision, natural language processing, and reinforcement learning. Students will use the PyTorch deep learning library for implementation and complete a final project on a real-world scenario.

5.0

30.0h

512 students

0 likes

Artificial Intelligence

Start Learning

Lessons

Lesson

1 Lesson 1

AI003: Deep Learning Fundamentals and Optimization (Lesson 1) introduces deep learning as a high-dimensional function approximation task built upon linear algebra and multivariate calculus. Students will learn how to implement and optimize neural networks by mastering the core training cycle of forward passes, backpropagation, and weight updates using vectorized matrix operations.

2 Lesson 2

This lesson introduces PyTorch Tensors as the fundamental multi-dimensional data structures used for hardware-accelerated computation and model parameters. It further explores the dynamic computation graph and the autograd engine, which allow for flexible, real-time gradient tracking and automatic differentiation during neural network training.

3 Lesson 3

EvoClass-AI003: From Fully Connected to Convolutional (Lesson 3) explores the limitations of dense layers in image processing, such as parameter explosion and the loss of spatial locality. It introduces Convolutional Neural Networks (CNNs) as a solution, focusing on the use of receptive fields, weight sharing, and the mathematical definition of the 2D convolution operation.

4 Lesson 4

EvoClass-AI003: Overview and Architectural Evolution (Lesson 4) explores the evolution of deep CNNs by analyzing how VGG, GoogLeNet, and ResNet addressed challenges in depth, computational efficiency, and gradient stability. Students will learn how these seminal architectures utilize techniques like small kernel stacking, bottleneck layers, and skip connections to optimize performance in ultra-deep networks.

5 Lesson 5

EvoClass-AI003: Recurrent Neural Networks and Sequence Modeling (Lesson 5) explores the transition from static data models to sequential data by introducing Recurrent Neural Networks (RNNs). Students will learn how RNNs utilize shared parameters and hidden states to maintain temporal memory, while also examining the challenges of gradient instability in long-sequence processing.

6 Lesson 6

EvoClass-AI003: From Recurrence to Attention (Lesson 6) explores how attention mechanisms overcome the scalability and information bottleneck limitations of traditional RNNs by enabling direct, parallelized dependency modeling. The lesson details the transition from fixed-size context vectors to dynamic, weighted contextual sums using the Query, Key, and Value tensor framework.

7 Lesson 7

EvoClass-AI003: From Sparse Vectors to Semantic Space (Lecture 7). This lesson explores the limitations of sparse representations like One-Hot Encoding, which suffer from extreme dimensionality and a lack of semantic meaning, and introduces dense word embeddings as a solution that captures linguistic relationships through continuous, low-dimensional vector spaces.

8 Lesson 8

This lesson introduces generative modeling as a shift from discriminative tasks to learning the underlying data distribution $P(x)$ through explicit density models like Variational Autoencoders (VAEs) and implicit models like GANs. It specifically explores the VAE framework, detailing how variational inference and the ELBO objective enable the creation of structured, continuous latent spaces for effective data synthesis and representation learning.

9 Lesson 9

This lesson introduces Deep Reinforcement Learning (DRL) as a framework where agents learn optimal policies through trial-and-error interactions within a Markov Decision Process (MDP). Students will explore how agents use scalar reward signals, discount factors, and the Markov property to make sequential decisions and maximize long-term cumulative returns.

10 Lesson 10

This lesson explores the labeling spectrum in machine learning, contrasting the high-cost requirements of supervised learning with the structural discovery of unsupervised learning and the hybrid efficiency of semi-supervised learning. It further examines deep unsupervised learning through autoencoders, which utilize an encoder-decoder architecture to compress data into meaningful latent representations.

Course Overview

📚 Content Summary

A brief summary of the core objectives: Master deep learning theory, implement models using PyTorch, understand specialized architectures (CNNs, RNNs, Transformers), and apply these concepts to computer vision, NLP, and sequential decision-making.

🎯 Learning Objectives

Explain the mathematical foundations and core optimization techniques (Gradient Descent, Backpropagation) necessary for training deep neural networks.
Utilize the PyTorch deep learning framework to efficiently implement, train, and debug modern network architectures using CUDA acceleration and efficient data handling techniques.
Design and analyze specialized architectures, including Convolutional Neural Networks (CNNs) for image data and the Transformer model for sequential dependencies.
Apply deep learning techniques to solve practical problems in core application domains: Computer Vision, Natural Language Processing, and Reinforcement Learning.
Evaluate models based on robustness, interpretability, and ethical fairness, comparing the strengths of various advanced paradigms (e.g., Generative Models, Semi-Supervised Learning).

Lessons

Overview: This lesson provides a deep dive into the paradigm shift introduced by the "Attention Is All You Need" paper, moving sequence modeling beyond Recurrent Neural Networks (RNNs) by eliminating recurrence and solely relying on attention. We will first establish the mathematical foundation of the Attention Mechanism, specifically focusing on the Scaled Dot-Product Attention using Query (Q), Key (K), and Value (V) vectors. The lecture then expands this concept into the Multi-Head Attention mechanism, explaining its role in capturing diverse contextual dependencies. The core focus will be on the complete Transformer architecture, analyzing the structure of both the Encoder and Decoder stacks, including crucial elements like Residual Connections, Layer Normalization, and the essential Positional Encoding required to maintain sequential information. Finally, we examine how the Transformer enables significant parallelization and its revolutionary impact on fields like Neural Machine Translation and pre-trained language models. Learning Outcomes:

Define the purpose of attention mechanisms and explain how they resolve the limitations (e.g., long-range dependencies, sequential processing bottleneck) of Recurrent Neural Networks.
Detail the mathematical operation of Scaled Dot-Product Attention, accurately identifying the roles of Query, Key, and Value vectors.
Describe the overall structure of the Transformer model, differentiating between the Encoder and Decoder stacks and explaining the function of Multi-Head Attention and Feed-Forward Networks.
Explain the necessity and mathematical implementation of Positional Encoding within the permutation-invariant Transformer architecture.
Analyze the computational benefits (parallelization) and widespread applicability of the Transformer architecture in modern Deep Learning tasks, referencing models like BERT and GPT.

Lessons

Lesson

1 Lesson 1

2 Lesson 2

3 Lesson 3

4 Lesson 4

5 Lesson 5

6 Lesson 6

7 Lesson 7

8 Lesson 8

9 Lesson 9

10 Lesson 10