SitePoint
  • Premium
  • Library
  • Community
  • Jobs
  • Blog
LoginStart Free Trial
Modern Computer Vision with PyTorch
Modern Computer Vision with PyTorch
Contributors
About the reviewers
Learn more on Discord
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Share your thoughts
Download a free PDF copy of this book
Section 1
Fundamentals of Deep Learning for Computer Vision
Artificial Neural Network Fundamentals
Comparing AI and traditional machine learning
Learning about the ANN building blocks
Implementing feedforward propagation
Calculating the hidden layer unit values
Applying the activation function
Calculating the output layer values
Calculating loss values
Feedforward propagation in code
Implementing backpropagation
Gradient descent in code
Implementing backpropagation using the chain rule
Putting feedforward propagation and backpropagation together
Understanding the impact of the learning rate
Learning rate of 0.01
Learning rate of 0.1
Learning rate of 1
Summarizing the training process of a neural network
Summary
Questions
Learn more on Discord
Installing PyTorch
PyTorch tensors
Initializing a tensor
Operations on tensors
Auto gradients of tensor objects
Advantages of PyTorch’s tensors over NumPy’s ndarrays
Building a neural network using PyTorch
Dataset, DataLoader, and batch size
Predicting on new data points
Implementing a custom loss function
Fetching the values of intermediate layers
Using a sequential method to build a neural network
Saving and loading a PyTorch model
Using state_dict
Saving
Loading
Summary
Questions
Learn more on Discord
Representing an image
Converting images into structured arrays and scalars
Creating a structured array for colored images
Why leverage neural networks for image analysis?
Preparing our data for image classification
Training a neural network
Scaling a dataset to improve model accuracy
Understanding the impact of varying the batch size
Batch size of 32
Batch size of 10,000
Understanding the impact of varying the loss optimizer
Building a deeper neural network
Understanding the impact of batch normalization
Very small input values without batch normalization
Very small input values with batch normalization
The concept of overfitting
Impact of adding dropout
Impact of regularization
Summary
Questions
Learn more on Discord
Section 2
Object Classification and Detection
Introducing Convolutional Neural Networks
The problem with traditional deep neural networks
Building blocks of a CNN
Convolution
Filters
Strides and padding
Pooling
Putting them all together
How convolution and pooling help in image translation
Implementing a CNN
Classifying images using deep CNNs
Visualizing the outcome of feature learning
Building a CNN for classifying real-world images
Impact on the number of images used for training
Summary
Questions
Learn more on Discord
Introducing transfer learning
Understanding the VGG16 architecture
Implementing VGG16
Understanding the ResNet architecture
Implementing ResNet18
Implementing facial keypoint detection
2D and 3D facial keypoint detection
Implementing age estimation and gender classification
Introducing the torch_snippets library
Summary
Questions
Learn more on Discord
Generating CAMs
Understanding the impact of data augmentation and batch normalization
Coding up road sign detection
Practical aspects to take care of during model implementation
Imbalanced data
The size of the object within an image
The difference between training and validation data
The number of nodes in the flatten layer
Image size
OpenCV utilities
Summary
Questions
Learn more on Discord
Introducing object detection
Creating a bounding-box ground truth for training
Understanding region proposals
Leveraging SelectiveSearch to generate region proposals
Implementing SelectiveSearch to generate region proposals
Understanding IoU
Non-max suppression
Mean average precision
Training R-CNN-based custom object detectors
Working details of R-CNN
Implementing R-CNN for object detection on a custom dataset
Downloading the dataset
Training Fast R-CNN-based custom object detectors
Working details of Fast R-CNN
Implementing Fast R-CNN for object detection on a custom dataset
Summary
Questions
Learn more on Discord
Components of modern object detection algorithms
Anchor boxes
Region proposal network
Classification and regression
Training Faster R-CNN on a custom dataset
Working details of YOLO
Training YOLO on a custom dataset
Installing Darknet
Setting up the dataset format
Configuring the architecture
Training and testing the model
Working details of SSD
Components in SSD code
Training SSD on a custom dataset
Summary
Questions
Learn more on Discord
Exploring the U-Net architecture
Performing upscaling
Implementing semantic segmentation using U-Net
Exploring the Mask R-CNN architecture
RoI Align
Mask head
Implementing instance segmentation using Mask R-CNN
Predicting multiple instances of multiple classes
Summary
Questions
Learn more on Discord
Multi-object instance segmentation
Fetching and preparing data
Training the model for instance segmentation
Making inferences on a new image
Human pose detection
Crowd counting
Implementing crowd counting
Image colorization
3D object detection with point clouds
Theory
Training the YOLO model for 3D object detection
Action recognition from video
Identifying an action in a given video
Training a recognizer on a custom dataset
Summary
Questions
Learn more on Discord
Section 3
Image Manipulation
Understanding autoencoders
How autoencoders work
Implementing vanilla autoencoders
Implementing convolutional autoencoders
Grouping similar images using t-SNE
Understanding variational autoencoders
The need for VAEs
How VAEs work
KL divergence
Building a VAE
Performing an adversarial attack on images
Understanding neural style transfer
How neural style transfer works
Performing neural style transfer
Understanding deepfakes
How deepfakes work
Generating a deepfake
Summary
Questions
Learn more on Discord
Introducing GANs
Using GANs to generate handwritten digits
Using DCGANs to generate face images
Implementing conditional GANs
Summary
Questions
Learn more on Discord
Leveraging the Pix2Pix GAN
Leveraging CycleGAN
How CycleGAN works
Implementing CycleGAN
Leveraging StyleGAN on custom images
The evolution of StyleGAN
Implementing StyleGAN
Introducing SRGAN
Architecture
Coding SRGAN
Summary
Questions
Learn more on Discord
Section 4
Combining Computer Vision with Other Techniques
Learning the basics of reinforcement learning
Calculating the state value
Calculating the state-action value
Implementing Q-learning
Defining the Q-value
Understanding the Gym environment
Building a Q-table
Leveraging exploration-exploitation
Implementing deep Q-learning
Understanding the CartPole environment
Performing CartPole balancing
Implementing deep Q-learning with the fixed targets model
Understanding the use case
Coding up an agent to play Pong
Implementing an agent to perform autonomous driving
Setting up the CARLA environment
Training a self-driving agent
Summary
Questions
Learn more on Discord
Introducing transformers
Basics of transformers
How ViTs work
Implementing ViTs
Transcribing handwritten images
Handwriting transcription workflow
Handwriting transcription in code
Document layout analysis
Understanding LayoutLM
Implementing LayoutLMv3
Visual question answering
Introducing BLIP2
Implementing BLIP2
Summary
Questions
Learn more on Discord
Introducing CLIP
How CLIP works
Building a CLIP model from scratch
Leveraging OpenAI CLIP
Introducing SAM
How SAM works
Implementing SAM
How FastSAM works
Implementing FastSAM
Introducing diffusion models
How diffusion models work
Diffusion model architecture
Implementing a diffusion model from scratch
Conditional image generation
Understanding Stable Diffusion
Building blocks of the Stable Diffusion model
Implementing Stable Diffusion
Summary
Questions
Learn more on Discord
In-painting
Model training workflow
In-painting using Stable Diffusion
ControlNet
Architecture
Implementing ControlNet
SDXL Turbo
Architecture
Implementing SDXL Turbo
DepthNet
Workflow
Implementing DepthNet
Text to video
Workflow
Implementing text to video
Summary
Questions
Learn more on Discord
Understanding the basics of an API
Creating an API and making predictions on a local server
Installing the API module and dependencies
Serving an image classifier
Containerizing the application
Building a Docker image
Shipping and running the Docker container on the cloud
Configuring AWS
Creating a Docker repository on AWS ECR and pushing the image
Pulling the image and building the Docker container
Identifying data drift
Using vector stores
Summary
Questions
Learn more on Discord
Chapter 1, Artificial Neural Network Fundamentals
Chapter 2, PyTorch Fundamentals
Chapter 3, Building a Deep Neural Network with PyTorch
Chapter 4, Introducing Convolutional Neural Networks
Chapter 5, Transfer Learning for Image Classification
Chapter 6, Practical Aspects of Image Classification
Chapter 7, Basics of Object Detection
Chapter 8, Advanced Object Detection
Chapter 9, Image Segmentation
Chapter 10, Applications of Object Detection and Segmentation
Chapter 11, Autoencoders and Image Manipulation
Chapter 12, Image Generation Using GANs
Chapter 13, Advanced GANs to Manipulate Images
Chapter 14, Combining Computer Vision and Reinforcement Learning
Chapter 15, Combining Computer Vision and NLP Techniques
Chapter 16, Foundation Models in Computer Vision
Chapter 17, Applications of Stable Diffusion
Chapter 18, Moving a Model to Production
Learn more on Discord
Why subscribe?
Other Books You May Enjoy
Packt is searching for authors like you
Share your thoughts
Download a free PDF copy of this book

Community Questions