The Video Processing Pipeline

Updated: Oct 23, 2018

In a typical video processing application, increasingly detailed information is produced at each step of the pipeline by first detecting low level features and then drawing high level insights by correlating the analysis product of each step. The pipeline typically progresses from extracting low level insights such as face detection and object detection towards high level insights such as face recognition, gender demographics etc.

Most video analytics applications comprise a series of steps in frame processing. At a fundamental level, analysis of a stream needs to detect changes that are occurring over successive frames of video, quantify these changes in each frame, correlate these changes over multiple frames, and finally, interpret the correlated changes to provide relevant insights for the use case the pipeline is supposed to solve.

Video analytics pipelines today commonly employ a combination of Convolutional Neural Networks as well as traditional computer vision approaches in order to improve the quality of insights derived from video streams. The various operations involved in video processing include but are not limited to:

  1. Object detection

  2. Image segmentation

  3. Image classification

  4. Object tracking

  5. Activity recognition

  6. Face recognition

All of these are described in the previous post that discusses various techniques that lie at the core of Video Analytics  This article provides an overview of the steps so involved in a Real Time Video Processing Pipeline.

Components of a Video Processing Pipeline

A Video Processing Pipeline would consist of the following components:

  1. Source of Video

  2. Video Capture Client

  3. Analytics Engine

  4. Database

  5. User Interface

  6. Optional Notification Pipeline

A Typical Video Processing Pipeline

Steps Involved in Video Processing

The components described above fit in with each other to form a sequence of steps via which the system proceeds to deliver the final analysis product:

  1. Frames from one or more cameras are sent to a Video Capture Client which is a system to handle incoming video streams and make them available to the system for further processing. Video Streaming Engines can function well as Video Capture Clients.

  2. The core of the system that extracts information from incoming frames and correlates it to draw inferences is the Analytics Engine. The analytics engine starts processing on the new stream that is now available in the streaming engine. The engine is logically divided into two parts. First low level analytics is performed to draw raw information from the data and then the high level analytics engine derives correlations from this raw information to form actionable insights that the user can view.

  3. To store insights derived as a result of analysis performed, a combination of relational as well as non-relational Database can be used for storing low level as well as high level analytics data.

  4. Optionally, to deliver live notifications to key personnel in case of time sensitive events, such as a possible robbery in a store(important in surveillance systems) a notification pipeline may be present in case time-sensitive events are of importance. A user may be notified of such events in the form of  Email/SMS Notifications.

  5. Delivery of insights gained in the form of charts and graphs to the user via a User Interface typically a Web Interface which comprises of appropriate graphic tools to describe the insights gained.

Various processing steps in the pipeline may be based on models such as Scene model, Camera model, Tracking model, Motion model which describe the processing performed on the screen. These models are updated over time as the system improves and more data is available. Various analytics pipelines may or may not include all processing steps along the way. Based on the domain and use case, a processing pipeline might:

  1. Include additional steps or lesser number of steps,

  2. Apply them in different order or

  3. Run multiple processes in parallel.

The specific structure of a pipeline is specific to the use case and application architecture.

We hope you found this post informative. Let us know in the comments in case you have any questions.

At Aidetic, we specialize in providing artificial intelligence enabled video analytics solutions. We build customized software for retail analytics, production line management, automated surveillance, and surveying and land mapping. Moreover, we also handcraft unique solutions for our clients' specific use cases. Feel free to write to us at to learn if we can help you with your specific use cases.

  • LinkedIn Social Icon
  • Twitter Social Icon
  • Facebook Social Icon
Aidetic Software Private Limited | All Rights Reserved