Nov 29

7 min read

ML interview preparation: computer vision

Usually during machine learning interviews after common questions there are project specific ones, so I have prepared a few must-knows for effective preparation and passing computer vision related interviews.

Main tasks of computer vision

classification — model learns what object is
object detection — model finds object location (we can draw bounding box around it)
object tracking — model locates object and looks where the object is going next
face recognition — model knows who is who
edge detection — model knows where object edges are
segmentation — model knows where exactly is the area of an object and we can create pixel wise mask over it

Types of segmentation

semantic — all object of one category are colored the same
instance — every object instance is divided from others

Popular computer vision libraries

OpenCV — one of the first and basic tools to use when familiarizing yourself with computer vision, open source library. On the internet you can find lots of usage examples from how to find face features and make small recognition models to video analysis.

It also implements such algorithms as K-Nearest Neighbors, Bayes Classifier, Decision Trees, Support Vector Machines, neural networks and more.

Popular computer vision networks

CNN (history started in previous century) — convolutional neural network concept, it detects features on image wherever they are, doesn’t need much image preprocessing

AlexNet (2012)

ReLU instead of standard at that time tanh (made network much faster)
first used consecutively going convolutional layers
first used dropout layers (technique was just invented back then)
included optimization for multiple GPUs
won ILSVRC (ImageNet Large Scale Visual Recognition Challenge) in 2012, being the first GPU-based CNNs to win an image recognition contest

VGGNet (2014) — CNN which uses filters smaller than AlexNet, less parameters than AlexNet, has even better performance.

GoogleNet / Inception v1 (2014) — CNN which proposes filters of multiple sizes which operate on the same level, making the network wider, not deeper. Won ILSVRC in 2014, leaving VGG in second place.

ResNet (2015) — Residual Network, CNN which does not have a vanishing gradient problem, so can be much deeper. Despite that, it has a smaller size (due to global average pooling instead of fully connected layers). Introduces residual block. Won ILSVRC in 2015.

UNet (2015) — network for image segmentation, called so due to U-shaped architecture. Part of it uses CNN too. Does not need a lot of training data.

YOLO (2015) — You Only Look Once is a CNN for real time object detection and classification. Originally based on GoogleNet and VGGNet and called DarkNet. Splits input into a grid of cells, each cell predicts a bounding box and object class which are later merged to a final prediction. Won several challenges at ISBI (International Symposium on Biomedical Imaging) in 2015

EfficientNet (2019) — even more powerful and accurate than ResNet

As you see, every network here has a relation to CNN architecture. I decided to put questions and theory about it to the separate article, you can find it here:

Computer vision interview: Convolutional neural network

Machine learning interview preparation: computer vision, convolutional neural network, pooling, popular convolutional…

medium.com

Do not miss it, as lots of interview questions are based on understanding simple concepts from there.

GAN (2014, although idea is older) — Generative Adversarial Network concept which is able to generate data similar to one you feed it. Uses noise + generator and discriminator networks to compete against each other so that the generator improves the generated output to be more alike to real input and discriminator tries to guess whether the input is real or fake.

I will write more about GANs in my next articles, as there is a lot of interesting stuff to talk about.

Popular computer vision datasets:

ImageNet is one of the largest datasets which everybody knows because of its challenge lots of new neural networks are estimated on — ILSVRC. But new datasets are being prepared every day. Here are some of the most popular for computer vision tasks and useful instruments where to look for more:

Datasets and where to find them

Datasets to play with, datasets to know for different tasks with info and links. Dataset sources where to look for new…

medium.com

Popular computer vision questions

How does the computer vision pipeline look?

It actually depends on a position you are applying for or a company you want to work at. Somebody expects you to mention data collection, somebody wants to talk about it from task formalization to deployment (although it can even not be your job to do) and somebody just wants to hear something in the middle. So overall the way looks something like this:

Task formalization → picking an algorithm and model architecture → data collection (& labeling if it is not present) → preprocessing and augmentation → features extraction → model training → inference and tests → analysis and optimization → more tests → deployment

How to prepare images for training?

check that each image represents labeled class or contains needed data
remove all other images
preprocess images
augment using appropriate for your task transformations

When to use grayscale images?

Sometimes color is not relevant for a task: if you want your model to learn other features and not hold on to color representation of an object it can really be a good choice. Not only can it make predictions better but as a bonus it will improve performance of your model. For example, if you train a model detecting what number of dots is on the dice — you do not need color. You may need it for flower or bird classification though.

How to evaluate a computer vision model?

Common evaluation metrics (not only for images) for machine learning models are: accuracy, precision and recall, F1 score. I have already mentioned these here, so you can revise them:

ML interview preparation— popular topics

Data preprocessing, augmentation, imbalanced data, regularization, activation functions and more

medium.com

For object detection there are some special metrics:

IoU (Intersection over Union) metric — a ratio of overlap area for predicted bounding box and the actual one to their common area of union. Usually the threshold of 0.5 is chosen to decide whether prediction is good, but it depends on a problem model is solving.
It also solves the multiple predictions for one object problem: only one (the most precise) is chosen.

mAP (mean average precision) — a metric which is counted with help of IoU, precision and recall, and precision recall curve. So first we have to count IoU for one class, then we count precision and recall. After that, building a precision recall curve we have an average precision (area under the curve) and repeat it for every class we have so we could count the mean value. To dive deeper into this metric explanation check out this great article:

What is Average Precision in Object Detection & Localization Algorithms and how to calculate it?

A step-by-step visual guide to understanding the mean average precision for object detection and localization…

towardsdatascience.com

How to reduce noise on an image?

Original image is from Harry Potter and the Order of the Phoenix

Gaussian filters blur image and sharpen it again
Median filters replace each pixel in an image by average value of surrounding pixels

How to detect edges of an object in an image?

To know where edges are we have to look for brightness discontinuities or for image gradients.

Edge detection operators can be used to achieve it using computation:

gaussian based (Canny edge detector, Laplacian of Gaussian)
gradient based (Sobel operator, Prewitt operator, Robert operator)

From these ones Canny edge detector is probably the most popular and quite effective one.

CNN networks are also used to find edges: before finding all other features the edges features are usually found.

There are also recent advancements in neural networks for edge detection:

CASENet (2017) — has semantic edge detection
DexiNed (2020) — doesn’t need prior training and works on various datasets without need for finetuning
RINDNet (2021) — not only detects edges, but knows their type: normal, illumination, depth, reflection
PiDiNet (2021) — lightweight and efficient edge detection

Where computer vision is used?

medical research
robotics and self-driving vehicles
manufacturing
wherever else object detection and tracking is needed
face recognition
education
architecture and design
space research and much, much more

I know there is a lot more to discuss, but it seems to me like an optimal size of an article. Thank you so much for reading this and for your support. As always, corrections and comments are welcome. See you next time.

Compliment of the day: I am not a computer, but I see you are doing a great job there. Keep on!

ML interview preparation: computer vision

Main tasks of computer vision

Types of segmentation

Popular computer vision libraries

Popular computer vision networks

Computer vision interview: Convolutional neural network

Machine learning interview preparation: computer vision, convolutional neural network, pooling, popular convolutional…

Popular computer vision datasets:

Datasets and where to find them

Datasets to play with, datasets to know for different tasks with info and links. Dataset sources where to look for new…

Popular computer vision topics

Popular computer vision questions

How does the computer vision pipeline look?

How to prepare images for training?

When to use grayscale images?

How to evaluate a computer vision model?

ML interview preparation— popular topics

Data preprocessing, augmentation, imbalanced data, regularization, activation functions and more

What is Average Precision in Object Detection & Localization Algorithms and how to calculate it?

A step-by-step visual guide to understanding the mean average precision for object detection and localization…

How to reduce noise on an image?

How to detect edges of an object in an image?

Where computer vision is used?

More from Maryna Klokova

Recommended from Medium

Black-Box Adaptation, Optimization-Based Approaches

Review — An Efficient Deep Neural Network Based Abnormality Detection and Multi‑Class Breast Tumor…

Brief Review — Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

Human Pose Estimation Using TensorFlow’s PoseNet Model

Review: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (MoE)

How to identify Nigerian dishes using Artificial Intelligence on Android devices (Part 1)

Predictive Analysis of an IPL Match

Tune Logistic Regression Hyperparameters (Python Code)

Get the Medium app

Maryna Klokova