Artificial Intelligence 9 min read

Real-Time Object Detection with OpenCV, Python, and Deep Learning

This tutorial walks through extending a deep‑learning object detector to process live video streams using OpenCV and Python, covering setup, command‑line arguments, model loading, frame‑by‑frame detection, drawing bounding boxes, FPS measurement, and performance tips.

MaGe Linux Operations

Nov 16, 2018

Real-Time Object Detection with OpenCV, Python, and Deep Learning

Using Deep Learning and OpenCV for Video Object Detection

Applying OpenCV and Python to real‑time video streams for deep‑learning object detection is straightforward: combine the right code, connect a live video source, and integrate the existing detection logic.

The guide is split into two parts. First, we expand a classic detection project to handle video streams and video files via the VideoStream class.

Preparing the Script

Create a new file named real_time_object_detection.py and add the initial imports (lines 2‑8). Ensure you have imutils and OpenCV 3.3 (or newer) with the opencv-contrib package installed.

Parse command‑line arguments for --prototxt (Caffe model definition), --model (pre‑trained weights), and --confidence (minimum probability, default 20%).

Initialize the list of class labels and assign random colors for each.

Load the serialized model using the provided prototxt and model files (line 30).

Start the video stream (camera or file) with VideoStream (line 35), wait for the camera to warm up (line 36), and begin FPS counting (line 37). The VideoStream and FPS utilities come from the imutils package.

Iterate over each frame: read the frame, resize it, construct a blob for the DNN module, set the blob as network input, and obtain detections.

For each detection, extract the confidence (line 59). If it exceeds the threshold (line 63), retrieve the class index (line 67) and compute the bounding‑box coordinates (lines 68‑69). Build a label containing the class name and confidence (lines 72‑73) and draw a colored rectangle and label on the frame (lines 74‑78). If there isn’t enough space above the box, place the label slightly below the top edge.

After processing detections, display the frame (line 81), check for the “q” key to quit (lines 82‑86), and update the FPS counter (line 89).

When the loop ends (either by pressing “q” or reaching the end of the stream), stop the FPS timer, print the elapsed seconds per frame, close the window, and release the video stream.

Results

Running the script on a webcam shows live frames with bounding boxes around detected objects, including people, sofas, and chairs. The detector processes roughly 6‑8 FPS depending on hardware.

Performance Tips

Skip frames to reduce workload.

Use faster MobileNet variants (trade‑off accuracy for speed).

Try quantized SqueezeNet models for an even smaller footprint.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Computer Vision object detection Video Stream

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.