Stated back to most of the incidents that take place, we decide to meticulously and methodically detect and track the person responsible for it. With the introduction of surveillance cameras, monitoring people has become as straightforward as looking at a computer screen. Researchers aim to propose useful algorithms and tools to support human in detecting suspicious target, recognising required target, and even tracking the target. Nowadays, as high quality high frame rate surveillance cameras are being widely used, much more efficient methods that yield higher accuracy are needed. In this article we focus on object tracking, talking about the use cases, methodologies and difficulties. Aim of this post is not to dive deep into details, but to give you an overall idea of the available techniques and methods. We will cover multiple scenarios namely single object tracking, multiple object tracking and multi camera tracking.
Tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around the scene.
In other words a tracker assigns consistent labels to the tracked objects in different frames of a video. Although this sounds like a trivial task, tracking objects in a video sequence can be complex. The difficulty can arise due to rapid appearance change caused by image noise, illumination, non rigid body motions, or because of non stable backgrounds, occlusions and interaction between multiple objects. Loss of information caused by projection from 3D world to a 2D image is also a challenge while solving this problem.
Application Domains of Visual Object Tracking
Monitoring, Assistance, Surveillance, Control, Defense
Robotics, Autonomous Car Driving, Rescue
Human Computer Interaction
Film Production and Post Production: Motion Capture, Editing, Video Stabilization
Management of Video Content: Indexing and Search
Action and Activity Recognition
Team Sports: Game Analysis, Player Statistics, Video Annotation
Need of Object Tracking
Most people start with object detection in machine learning and computer vision, and are often tempted to ask why do we need object tracking, when we can detect objects in every frame. There a few reasons where tracking is beneficial as compared to detecting objects in each frame:
In case of multiple objects, tracking helps establish the identity of the objects across frames.
In some cases, object detection may fail but it may still be possible to track the object because tracking takes into account the location and appearance of the object in the previous frame.
Some tracking algorithms are very fast because they do a local search instead of a global search.
How Does Object Tracking Work?
There are two key steps in object tracking process: first is detection of an object in a given scenario and second is frame by frame tracking of the object. To perform tracking in video sequences, an algorithm analyses sequential video frames and outputs the
movement of target between the frames. Many tracking algorithms have been proposed so far. These object tracking methods are classified according to their tracking behaviour. Classification is based on either features that are to be extracted from an image or on the representation of appearance or motion of the object.
An object is nothing but an entity of interest. Object can be represented either by the shape or appearance. Different ways of representing shapes are: (a) Centroid, (b) multiple points, (c) rectangular patch, (d) elliptical patch, (e) part-based multiple patches, (f) object skeleton, (g) complete object contour, (h) control points on object contour, (i) object silhouette. The choice of representation depends on factors such as rigidity, size or articulation of the object.
Similarly there are various ways to represent the appearance feature of objects.
It should be noted that the shape representation can be combined with appearance
representations for tracking. Some common appearance representations in the case
of object tracking are: probability densities of object appearance features (Color, Texture), active appearance models, and multi view appearance models.
Once the representation is finalized, some sort of object detection technique is required to detect the object and initialize the tracking. The object detection technique can use one of the above mentioned features (color, texture) to detect the object of interest.
Common object detection techniques are:
Point Detectors: Point detectors are used to find interesting points in images which have an expressive texture in their respective localities. A desirable quality of an interest point is its invariance to changes illumination and camera viewpoint. In literature, commonly used interest point detectors include Moravec’s detector, Harris detector, KLT detector, SIFT detector.
Background Subtraction: Object detection can be achieved by building a
representation of the scene called the background model and then finding deviations from the model for each incoming frame. There are various methods of background subtraction such as: Frame differencing Region-based (or) spatial information, Hidden Markov models (HMM) and Eigen space decomposition.
Object Detection and Segmentation: Images are globally search to detect and localize object of interest. Convolutional Neural Networks can be used for detecting objects, or some kind features classifiers can be used with sliding window technique (Not very common due to the time taken by sliding window approach).
The tasks of detecting the object and establishing a correspondence between the
object instances across frames can either be performed separately or jointly. In the
first case, possible object region in every frame is obtained by means of an object
detection algorithm, and then the tracker corresponds objects across frames. In the
latter case, the object region and correspondence is jointly estimated by iteratively
updating object location and region information obtained from previous frames.
Point Tracking: In an image structure, moving objects are represented by their feature points during tracking. Point tracking is a complex problem particularly in the incidence of occlusions, false detections of object. Recognition can be done relatively simple, by thresholding, at of identification of these points.
Kernel Based Tracking: Kernel tracking is usually performed by computing the moving object, which is represented by a embryonic object region, from one frame to the next. The object motion is usually in the form of parametric motion such as translation, conformal, affine, etc.
These algorithms diverge in terms of the presence representation used, the number of objects tracked, and the method used for approximation the object motion.
Silhouette Based Tracking Approach: Some object will have complex shape such as hand, fingers, shoulders that cannot be well defined by simple geometric shapes. Silhouette based methods afford an accurate shape description for the objects. The aim of a silhouette-based object tracking is to find the object region in every frame by means of an object model generated by the previous frames. Capable of dealing with variety of object shapes, Occlusion and object split and merge.
In the next part of this blog we’ll discuss about multi object tracking and multi camera tracking.
At Aidetic, we specialize in providing artificial intelligence enabled video analytics solutions. We build customized software for retail analytics, production line management, automated surveillance, and surveying and land mapping. Moreover, we also handcraft unique solutions for our clients' specific use cases. Feel free to write to us at email@example.com to learn if we can help you with your specific use cases.