Video Analytics - What Lies at the Core

Updated: Oct 23, 2018

Artificial Intelligence is progressively seeping into our lives each passing day. New products and applications which extensively employ AI are getting built with the aim of revolutionizing the way we go about our lives. However, with all the hype around AI today, it is easy to be awestruck by the innovation and regard most AI software as black boxes. The aim of this post is to show that this doesn’t necessarily have to be the case.

Looking at the end results of well-engineered AI applications may make the software powering the application look unnaturally intelligent and revolutionary. Indeed most of the best AI software out there is revolutionary to say the least. However, it must also be understood that the software isn’t a single huge program with an artificial brain that throws out these seemingly “intelligent” results when prompted. Instead, the software is comprised of several shorter AI programs (for example facial recognition or object tracking) that are absolutely nailing it under the hood. These relatively simpler and specialized AI programs do a particular task very well and work in coordination to make the whole AI software better than the sum of its parts. Thus, the effective intelligence of any AI software can be attributed to efficient division of labour of the lower-level AI services that constitute the software.

Here we take the case of intelligent video analytics and automated surveillance - a category of cutting-edge technology employing AI at its core - and break it down to unveil the core technologies. Video analytics involves getting information from video streams - identifying people, detecting and categorizing action, tracking objects, noting anomalies etc. Refer to this article for a more detailed discourse on possible use cases of video analytics.

Let us discretely understand the technologies first and then put it all into perspective with a deepdive into the pipeline of a typical video analytics application in the next post.

1. Object Detection

This is one of the relatively simpler tasks but nonetheless very important. Objects, most often people, are detected in video streams using neural networks (of course, what else did you expect!). Frames of the video are inputted into convolutional neural networks (CNNs) and boxes delineating objects are returned along with a probability score for the type of object. The image below shows a standard example.

Object Detection in Action. [Source: Google Images]

2. Facial Recognition

Facial recognition is another core technology embedded in systems for video analytics. Known faces are encrypted and stored in a database. When the software detects a face, it uses cutting-edge algorithms (FaceNet is a great algorithm for this) to compare it against the faces in the database. Again, at the core, convolutional neural networks work to generate feature vectors (fancy term for a set of numbers depicting a face) and compare them against those stored in the database. Facial recognition comes in really handy for surveillance to identify unwanted trespassing and has uses in retail for organizing data on consumers.

[Source: Google Images]

3. Demographics Detection

Once people are detected in the frames, some of their attributes can also be detected using deep learning. Cropped images of persons detected in the frames can be passed into another neural network which specializes in detecting attributes like age, gender, and clothing color, attire type etc.. Information on demographics can be used to index videos for retrospective searchability in surveillance use cases as well as to inform retailers of their customer attributes. A high degree of accuracy can be achieved if the deep learning models are trained well.

4. Multi-object Tracking

Obviously, a software responsible for something as complex as automated analysis of videos has to have a lot of constituent elements - not all that can be achieved with AI. Multi-object tracking across multiple cameras is one such feature which requires an amalgam of traditional computer vision and AI algorithms. Stochastic models are developed to track objects across a single camera and feature vectors are developed using deep learning to tie the tracked objects together across multiple cameras. If you’re interested in learning more about tracking, stay in the loop for an upcoming post where we delve into this subject in more detail. Tracking is immensely useful to know the way people move around - be it inside a property under surveillance or a retail setting.

Multi Object Tracking. [Source: Google Images]

We hope you found this post informative. Let us know in the comments in case you have any questions.

At Aidetic, we specialize in providing artificial intelligence enabled video analytics solutions. We build customized software for retail analytics, production line management, automated surveillance, and surveying and land mapping. Moreover, we also handcraft unique solutions for our clients' specific use cases. Feel free to write to us at to learn if we can help you with your specific use cases.

  • LinkedIn Social Icon
  • Twitter Social Icon
  • Facebook Social Icon
Aidetic Software Private Limited | All Rights Reserved