Object Tracking


Introduction


Computer vision is fundamental to the field of modern robotics, serving as the eyes for machines and enabling them to interpret and understand the visual world. At its core, computer vision seeks to replicate the complexities of human vision by allowing robots to identify, process, and react to visual data. One of the critical applications within this domain is human face detection, which has become increasingly prevalent in security systems, user authentication, and interactive robots that engage with humans. Examples of this technology can be seen in everything from smartphone security features, like face unlock, to social robots in public spaces that can recognize and interact with individuals based on facial recognition.

The advent of Convolutional Neural Networks (CNNs) has significantly advanced the field, surpassing traditional methods like Haar cascades in both accuracy and reliability. CNNs, inspired by the biological processes of the human brain, automatically and adaptively learn spatial hierarchies of features from images. This learning process enables the detection of faces with a higher degree of accuracy and in more challenging conditions, such as varying angles, poses, and lighting conditions, making them particularly effective for real-world applications. The depth and flexibility of CNNs allow them to capture a wide array of features and characteristics, leading to their widespread adoption in not just face detection but also in a broader range of computer vision tasks in robotics, further enhancing machine perception and interaction capabilities in an ever-expanding array of contexts.

In this section, we follow a very brief step in the evolution of face detection algorithms as well as how they are applied to a robot. You can do this in either MATLAB OR Python. Do not do both unless you feel like challenging yourself.

Demonstration of BEATRIX’s Face-Tracking Capability

MATLAB Implementation (Option 1)


Easy Task: Implementing Face Detection with Haar Cascade Classifiers

Central to the early methods of face detection is the concept of Haar cascades, an algorithm named after mathematician Alfred Haar. This technique involves the use of Haar features, which are simple contrast features that can be rapidly computed thanks to an integral image concept. These features are used to train a cascade function that effectively identifies faces by scanning an image at multiple scales and searching for features that match a face. The algorithm distinguishes faces from the background by identifying specific patterns and contrasts that are characteristic of human faces, such as the area around the eyes being darker than the cheeks. The Haar cascade classifier was first adapted by using Haar wavelets and developed by Paul Viola and Michael Jones, which has been used regularly over the last 20 years in all types of devices. (You do not need to understand the maths for this, but if you wish to do so, you can find the academic paper here, as well as the working principle).

Here is a useful video that covers Computer Vision and its relevance;

Objective

Use Haar cascade classifiers to detect faces in a controlled environment (e.g. consistent lighting, minimal background clutter, frontal faces).

Tasks

  1. Setup: Install OpenCV and familiarize yourself with its basic functionalities.
    • If you use MATLAB, you need to install the Computer Vision Toolbox (MATLAB)
    • You can use the cv import by executing:
import clib.opencv.*;
import vision.opencv.util.*;
Code snippet

You can find more information on the MATLAB website.

  1. Face Detection: Implement face detection using OpenCV’s pre-trained Haar cascade classifier for faces. Test the implementation on static images to ensure it can accurately identify faces.
  2. Robot Control (Basic): Develop a simple control system where the robot moves in a predetermined direction upon detecting a face. For instance, the robot could move for a few seconds once a face is detected and stop when no face is detected.
    • Just do a basic left or right of the centre of frame to start with using @MOVRALL 10 10 10 10 10 10 or @MOVRALL –10 –10 –10 10 10 10

Challenges

Working with Haar cascade classifiers is relatively straightforward but expect challenges in dealing with false positives and negatives, especially under varying lighting conditions or different face orientations.

Extension

Use a HSV filter on the input image to examine false positives.

HSV is Hue, Saturation and Value. This is a colour gradient and is how pixels are assigned numbers in a picture. The same way we can mix with Red, Green and Blue (RGB), we can perform the same with HSV. provides a sample MATLAB script to play around with it:

webcam_continuous_hsv()
function webcam_continuous_hsv()
% Initialize webcam
cam = webcam;
% Create figure and axes
fig = figure('Name', 'Continuous Webcam Feed with HSV Slider', 'NumberTitle', 'off');
ax = axes('Parent', fig);
% Create HSV sliders
h_slider = uicontrol('Style', 'slider', 'Min', 0, 'Max', 360, ...
'Value', 0, 'Position', [20 20 120 20], 'Callback', @update_hsv);
s_slider = uicontrol('Style', 'slider', 'Min', 0, 'Max', 1, ...
'Value', 1, 'Position', [160 20 120 20], 'Callback', @update_hsv);
v_slider = uicontrol('Style', 'slider', 'Min', 0, 'Max', 1, ...
'Value', 1, 'Position', [300 20 120 20], 'Callback', @update_hsv);
% Create labels for sliders
h_label = uicontrol('Style', 'text', 'String', 'Hue', 'Position', [70 45 40 15]);
s_label = uicontrol('Style', 'text', 'String', 'Saturation', 'Position', [210 45 60 15]);
v_label = uicontrol('Style', 'text', 'String', 'Value', 'Position', [350 45 40 15]);
% Initialize HSV values
h_value = 0;
s_value = 1;
v_value = 1;
% Continuously update the image
while ishandle(fig)
% Capture image from webcam
img = snapshot(cam);
% Convert image to HSV
img_hsv = rgb2hsv(img);
% Adjust HSV values
img_hsv(:, :, 1) = mod(img_hsv(:, :, 1) + h_value/360, 1);
img_hsv(:, :, 2) = img_hsv(:, :, 2) * s_value;
img_hsv(:, :, 3) = img_hsv(:, :, 3) * v_value;
% Convert back to RGB
img_output = hsv2rgb(img_hsv);
% Display updated image
imshow(img_output, 'Parent', ax);
% Pause briefly to control the frame rate
pause(0.1);
end
% Function to update HSV values
function update_hsv(~, ~)
% Get current slider values
h_value = get(h_slider, 'Value');
s_value = get(s_slider, 'Value');
v_value = get(v_slider, 'Value');
end
end
HSV Example MATLAB Script.

Intermediate Task: Enhancing Face Detection with CNNs

Objective

Utilize a CNN (such as a pre-trained model from TensorFlow or PyTorch) for more robust face detection in varied lighting, backgrounds, and face orientations.

Tasks

  1. Setup:
    • Use MATLAB
    • Make sure the deep learning module is installed
    • Are there other packages that do this?
  2. Face Detection with CNN: Implement face detection using a pre-trained CNN model such as MTCNN (Multi-task Cascaded Convolutional Networks) or a similar architecture. Test the model’s performance on a wider range of images, including those with complex backgrounds and varying lighting conditions.
  3. Robot Control (Intermediate): Develop a more sophisticated control system where the robot’s movement is based on the position of the detected face in the image frame. Add some PID Control to the movement to smooth out the motion.

Challenges

While CNNs offer improved accuracy, they require more computational resources. Handling real-time video streams for face detection and processing could introduce latency.

An example MATLAB script for face-detection is provided in .

% Use the Add-On Explorer in MATLAB to install the Deep Learning package,
% MTCNN and Webcam
%used: Justin Pinkney (2024). MTCNN Face Detection (https://github.com/matlab-deep-learning/mtcnn-face-detection/releases/tag/v1.2.4), GitHub. 
% Create a webcam object (adjust the index if you have multiple webcams)
cam = webcam;
% Open a figure to display the output
figure;
% Loop to continuously capture frames from the webcam
while true
% Capture one frame
img = snapshot(cam);
% Detect faces in the captured frame
% [bboxes, scores, landmarks] = mtcnn.detectFaces(img) is defined by mtcnn package
[bboxes, scores, landmarks] = mtcnn.detectFaces(img);
% Display the captured frame
imshow(img);
title('MTCNN Face Detection');
% Draw bounding boxes around detected faces
hold on;
for ii = 1:size(bboxes, 1)
drawrectangle('Position', bboxes(ii, :), 'EdgeColor', 'r', 'LineWidth', 2, 'facealpha', 0);
end
hold off;
% Include a small delay to reduce processing load if necessary
pause(0.01);
% Check for a key press to exit the loop
% Press 'q' to quit the loop
key = get(gcf,'CurrentKey'); 
if strcmp(key, 'q')
break;
end
end
% Clear the webcam object when done
clear('cam');
Example Face-Detection MATLAB Script.

Advanced Task: Advanced Robot Control Based on Face Detection and Additional Features

Objective

Combine face detection with additional features (like facial expressions or the number of faces) using a CNN to control the robot’s movement in a complex manner.

Tasks

  1. Advanced Face Detection: Implement advanced face detection that not only identifies the presence and position of faces but can handle and identify multiple faces using a CNN.
  2. Robot Control (Advanced): Create an advanced control system where the robot’s actions are determined by complex inputs, such as moving towards the happiest face in the group, following a person as they move, or changing its behaviour based on the number of faces detected.
  3. Integration and Testing: Integrate the face detection system with BEATRIX’s control mechanism, ensuring smooth operation. Test the system in a dynamic environment, simulating real-world conditions as closely as possible.

Challenges

This level requires dealing with real-time data processing and decision-making, managing computational demands, and ensuring the robot’s movements are smooth and responsive.

Python Implementation (Option 2)


Easy Task: Implementing Face Detection with Haar Cascade Classifiers

Objective

Use Haar cascade classifiers to detect faces in a controlled environment (e.g., consistent lighting, minimal background clutter, frontal faces).

Tasks

  1. Setup: Install OpenCV and familiarize yourself with its basic functionalities.
    • If you use anaconda (Python):
conda create --name beatrix_pyt_env python=3.8
conda activate beatrix_pyt_env
conda install pytorch torchvision torchaudio -c pytorch
conda install argparse pyserial opencv
Code snippet
  • If this doesn’t work use pip (not recommended due to package conflicts)
  1. Face Detection: Implement face detection using OpenCV’s pre-trained Haar cascade classifier for faces. Test the implementation on static images to ensure it can accurately identify faces.
  2. Robot Control (Basic): Develop a simple control system where the robot moves in a predetermined direction upon detecting a face. For instance, the robot could move for a few seconds once a face is detected and stop when no face is detected.
    • Just do a basic left or right of the centre of frame to start with using @MOVRALL 10 10 10 10 10 10 or @MOVRALL –10 –10 –10 10 10 10

Challenges

Working with Haar cascade classifiers is relatively straightforward but expect challenges in dealing with false positives and negatives, especially under varying lighting conditions or different face orientations.

Extension

Play with a HSV filter on the input image to examine the false positive behaviour. Although in the case of face detection using a pretuned classifier it often makes it worse, we can examine how we can improve this.

provides an exemplar Python script to adjust HSV values on images collected from a connected camera.

import cv2
import numpy as np

class WebcamHSVAdjuster:
    def __init__(self):
        self.cap = cv2.VideoCapture(0)
        if not self.cap.isOpened():
            print("Error: Could not open webcam.")
            exit()
        cv2.namedWindow('Webcam with HSV Slider', cv2.WINDOW_NORMAL)
        # Initialize HSV adjustment values
        self.current_hue = 0
        self.current_saturation = 255
        self.current_value = 255
        # Create HSV sliders and set callback
        cv2.createTrackbar('Hue', 'Webcam with HSV Slider', 0, 179, self.update_hsv_values)
        cv2.createTrackbar('Saturation', 'Webcam with HSV Slider', 255, 255, self.update_hsv_values)
        cv2.createTrackbar('Value', 'Webcam with HSV Slider', 255, 255, self.update_hsv_values)
        self.run()

    def update_hsv_values(self, x):
        # Update HSV adjustment values from trackbars
        self.current_hue = cv2.getTrackbarPos('Hue', 'Webcam with HSV Slider')
        self.current_saturation = cv2.getTrackbarPos('Saturation', 'Webcam with HSV Slider')
        self.current_value = cv2.getTrackbarPos('Value', 'Webcam with HSV Slider')

    def run(self):
        while True:
            # Read the current frame from the webcam
            ret, frame = self.cap.read()
            if not ret:
                print("Error: Could not read frame.")
                break
            # Convert the frame to HSV color space
            hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
            # Apply the current HSV adjustments
            hsv[:, :, 0] = (hsv[:, :, 0] + self.current_hue) % 180
            hsv[:, :, 1] = np.clip(hsv[:, :, 1] + self.current_saturation - 255, 0, 255)
            hsv[:, :, 2] = np.clip(hsv[:, :, 2] + self.current_value - 255, 0, 255)
            # Convert the frame back to BGR color space
            edited_frame = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
            # Display the edited frame
            cv2.imshow('Webcam with HSV Slider', edited_frame)
            # Exit the loop if 'q' is pressed
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

        # Release the webcam and destroy all windows
        self.cap.release()
        cv2.destroyAllWindows()

if __name__ == "__main__":
    WebcamHSVAdjuster()
Example HSV Python Script.

Intermediate Task: Enhancing Face Detection with CNNs

Objective

Utilize a CNN (like a pre-trained model from TensorFlow Keras (easy) or PyTorch (recommended to learn)) for more robust face detection in varied lighting, backgrounds, and face orientations.

Tasks

  1. Setup: Choose and set up a deep learning framework (TensorFlow or PyTorch):
conda create --name beatrix_pyt_dev python=3.9
conda activate beatrix_pyt_dev
conda install pytorch torchvision torchaudio -c pytorch
conda install argparse pyserial opencv
Code snippet
  1. Face Detection with CNN: Implement face detection using a pre-trained CNN model such as MTCNN (Multi-task Cascaded Convolutional Networks) or a similar architecture. Test the model’s performance on a wider range of images, including those with complex backgrounds and varying lighting conditions.
  2. Robot Control (Intermediate): Develop a more sophisticated control system where the robot’s movement is based on the position of the detected face in the image frame. Add some PID Control to the movement to smooth out the motion.

Challenges

While CNNs offer improved accuracy, they require more computational resources. Handling real-time video streams for face detection and processing could introduce latency.

Advanced Task: Advanced Robot Control Based on Face Detection and Additional Features

Objective

Combine face detection with additional features (like facial expressions or the number of faces) using a CNN to control the robot’s movement in a complex manner.

Tasks

  1. Advanced Face Detection: Implement advanced face detection that not only identifies the presence and position of faces but can handle and identify multiple faces using a CNN.
  2. Robot Control (Advanced): Create an advanced control system where the robot’s actions are determined by complex inputs, such as moving towards the happiest face in the group, following a person as they move, or changing its behaviour based on the number of faces detected.
  3. Integration and Testing: Integrate the face detection system with BEATRIX’s control mechanism, ensuring smooth operation. Test the system in a dynamic environment, simulating real-world conditions as closely as possible.

Challenges

This level requires dealing with real-time data processing and decision-making, managing computational demands, and ensuring the robot’s movements are smooth and responsive.