ML interview preparation: computer vision

Main tasks of computer vision

  • classification — model learns what object is
  • object detection — model finds object location (we can draw bounding box around it)
  • object tracking — model locates object and looks where the object is going next
  • face recognition — model knows who is who
  • edge detection — model knows where object edges are
  • segmentation — model knows where exactly is the area of an object and we can create pixel wise mask over it

Types of segmentation

  • semantic — all object of one category are colored the same
  • instance — every object instance is divided from others

Popular computer vision libraries

OpenCV — one of the first and basic tools to use when familiarizing yourself with computer vision, open source library. On the internet you can find lots of usage examples from how to find face features and make small recognition models to video analysis.

Popular computer vision networks

CNN (history started in previous century) — convolutional neural network concept, it detects features on image wherever they are, doesn’t need much image preprocessing

  • first used consecutively going convolutional layers
  • first used dropout layers (technique was just invented back then)
  • included optimization for multiple GPUs
  • won ILSVRC (ImageNet Large Scale Visual Recognition Challenge) in 2012, being the first GPU-based CNNs to win an image recognition contest

Popular computer vision datasets:

ImageNet is one of the largest datasets which everybody knows because of its challenge lots of new neural networks are estimated on — ILSVRC. But new datasets are being prepared every day. Here are some of the most popular for computer vision tasks and useful instruments where to look for more:

Popular computer vision topics

Image preprocessing — steps we take to format images before feeding them to a network or before making an inference. It involves image transformations.

Image from fastai vision.augment documentation page

Popular computer vision questions

How does the computer vision pipeline look?

It actually depends on a position you are applying for or a company you want to work at. Somebody expects you to mention data collection, somebody wants to talk about it from task formalization to deployment (although it can even not be your job to do) and somebody just wants to hear something in the middle. So overall the way looks something like this:

How to prepare images for training?

  • check that each image represents labeled class or contains needed data
  • remove all other images
  • preprocess images
  • augment using appropriate for your task transformations

When to use grayscale images?

Sometimes color is not relevant for a task: if you want your model to learn other features and not hold on to color representation of an object it can really be a good choice. Not only can it make predictions better but as a bonus it will improve performance of your model. For example, if you train a model detecting what number of dots is on the dice — you do not need color. You may need it for flower or bird classification though.

How to evaluate a computer vision model?

Common evaluation metrics (not only for images) for machine learning models are: accuracy, precision and recall, F1 score. I have already mentioned these here, so you can revise them:

How to reduce noise on an image?

Original image is from Harry Potter and the Order of the Phoenix
  • Median filters replace each pixel in an image by average value of surrounding pixels

How to detect edges of an object in an image?

To know where edges are we have to look for brightness discontinuities or for image gradients.

  • gradient based (Sobel operator, Prewitt operator, Robert operator)
  • DexiNed (2020) — doesn’t need prior training and works on various datasets without need for finetuning
  • RINDNet (2021) — not only detects edges, but knows their type: normal, illumination, depth, reflection
  • PiDiNet (2021) — lightweight and efficient edge detection

Where computer vision is used?

  • medical research
  • robotics and self-driving vehicles
  • manufacturing
  • wherever else object detection and tracking is needed
  • face recognition
  • education
  • architecture and design
  • space research and much, much more
Actually it is “buy me a coffee” link, but even more accurate name is: “thank you for your support”

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Maryna Klokova

Maryna Klokova

222 Followers

Traveling developer. Love coding, cooking, board games and all about AI. Become a Medium member and support me: https://medium.com/@marizombie/membership