Do all versions of OpenCV run at the same speed?

OpenCV was originally created by Intel to demonstrate how fast Intel CPUs can perform, and therefore the speed performance of OpenCV was of crucial importance. However, OpenCV has recently changed from a low-level image processing C library built for maximum speed, to a higher-level computer vision C++ library. Many cutting-edge techniques are added with each new version of OpenCV, and there have been several quite major changes to the file structure of recent versions of OpenCV as well as the API interface. The good news is that the new C++ interface is usually easier to use than the old C interface, however the bad news is that many functions are now slower than they used to be in previous years! But since the OpenCV developers want everyone to migrate from the old C interface to the new C++ interface, it is not obvious which version of OpenCV should be used for a project. Therefore some timing tests are shown here for 2 very common operations in OpenCV: Hough Line Detection and Haar Face Detection.

Note: These timings are for a single-core. Newer versions of OpenCV take more advantage of multi-core CPUs (eg: using TBB library), so newer versions might actually run faster on multi-core CPUs than older versions, even if the single-threaded performance shown here says the opposite.

Line Detection using Probabilistic HoughLines2

The Hough transform is a popular method for detecting straight lines (or potentially circles) in a noisy image. First the image must be converted to a binary image (black and white image, not grayscale), which is often performed using the Canny edge detector due to it’s reliability and speed. The traditional Hough line detector in OpenCV provides the orientation of detected lines but not their start and end points, whereas the Probabilistic HoughLines detector gives start and end points of detected lines, therefore is useful for more applications.

Here are the results of some timing tests of OpenCV’s Probabilistic Hough Line Detector, showing the best time out of 500 runs on an Intel Core 2 Duo 2.4GHz compiled with full optimizations in VS2008, on the 512×512 pixel “Lena.jpg” image):

Version:	Hough Lines:
v1.1 (C):	33ms
v2.0 (C):	38ms
v2.0 (C++):	39ms
v2.1 (C):	43ms
v2.1 (C++):	46ms
v2.2 (C):	62ms
v2.2 (C++):	48ms
v2.3 (C):	48ms
v2.3 (C++):	46ms

Conclusion for Hough Line Detection

When using the traditional C interface, the Probabilistic Hough Line detector takes roughly twice as long to run in OpenCV v2.2 compared to v1.1. Or if the new C++ interface is used instead, then v2.2 is still about 50% slower than v1.1. The slow C code has been fixed in v2.3.1.

Face or Object Detection using Haar Cascades

OpenCV has an object detector that is both very fast and very reliable in real-world conditions, making it perhaps the most useful feature of the entire OpenCV library. OpenCV comes with several different detectors for frontal faces that are all very reliable, as well as several other detectors of body parts with reasonable reliability, and custom object detectors can be trained for other uses such as car detection (using thousands of sample photos and running the tool for roughly 1 week to process all the images!). While this object detector is extremely fast compared to previous face detection methods, it is still usually the slowest part of most computer vision programs since it typically needs several hundreds of milliseconds per frame.

Shrinking the input image can give a very large speed boost without necessarily reducing the detection reliability, so the easiest way to have a faster face detector is to shrink your input images to the smallest reasonable size that your project would allow (eg: 200×100 if you are only looking for 1 person that is always near the camera). There are also various parameters that can be adjusted in the object detector to give better speed or better results (searchScaleFactor, minNeighbors and minFaceSize). But there is another factor which is not obvious: choice of OpenCV version, and choice of C or C++ interface.

Some tests of OpenCV’s Haar Face Detector were performed using different versions of OpenCV, as well as comparing the C interface and the C++ interface. Several different images were tested and a typical image was used for the following results. It is important to notice that there are fairly large differences in speeds, but also differences in detected faces and reliability. In general, the newer versions of OpenCV are slower but they detect less “nonfaces”, which is more important than speed differences in some projects. The results here are sorted by the best time out of 30 runs, performed on an Intel Core 2 Duo 2.4GHz compiled with full optimizations in MS Visual Studio 2008, using OpenCV’s “haarcascade_frontalface_alt.xml” face detector on a 640×504 pixel photo containing 8 frontal faces:

Version:	Time:		Faces Found:	Nonfaces:
v1.1 (C):	360ms		7 of 8 faces	3 nonfaces
v2.0 (C):	370ms		7 of 8 faces	2 nonfaces
v2.0 (C++):	800ms		7 of 8 faces	3 nonfaces
v2.1 (C):	680ms		7 of 8 faces	1 nonfaces
v2.1 (C++):	490ms		7 of 8 faces	0 nonfaces
v2.2 (C):	680ms		7 of 8 faces	1 nonfaces
v2.2 (C++):	490ms		7 of 8 faces	0 nonfaces
v2.3 (C):	820ms		7 of 8 faces	1 nonfaces
v2.3 (C++):	490ms		7 of 8 faces	0 nonfaces

Conclusion for Face Detection

The old C code was about twice as fast as the new C code, but the new C++ code is more reliable and only 35% slower (except that OpenCV v2.0 was much slower!). So you should use OpenCV v2.0 or older if you want to use the C interface, or shrink the image and use OpenCV v2.1 or newer if you want to use the C++ interface. Note that the new C++ interface also includes an LBP face detector, which runs about 4 times faster than Haar and has less false-positives, but also detects less faces (true-positives).

Millisecond Timers used for these tests

// Record the execution time of some code, in milliseconds. By Shervin Emami, May 4th 2011.
// eg:
//	START_TIMING(myTimer);
//	  printf("A slow calc = %f\n", 1.0/sqrt(2.0) );
//	STOP_TIMING(myTimer);
//	SHOW_TIMING(myTimer, "My Timer");
#define DECLARE_TIMING(s)	int64 timeStart_##s; int64 timeDiff_##s; int64 timeTally_##s = 0; int64 countTally_##s = 0
#define START_TIMING(s)		timeStart_##s = cvGetTickCount()
#define STOP_TIMING(s)		timeDiff_##s = (cvGetTickCount() - timeStart_##s); timeTally_##s += timeDiff_##s; countTally_##s++
#define GET_TIMING(s)		(double)(0.001 * ( (double)timeDiff_##s / (double)cvGetTickFrequency() ))
#define GET_AVERAGE_TIMING(s)	(double)(countTally_##s ? 0.001 * ( (double)timeTally_##s / ((double)countTally_##s * cvGetTickFrequency()) ) : 0)
#define GET_TIMING_COUNT(s)	(int)(countTally_##s)
#define CLEAR_AVERAGE_TIMING(s)	timeTally_##s = 0; countTally_##s = 0
#define SHOW_TIMING(s, msg)	printf("%s time: \t %dms \t (%dms average across %d runs).\n", msg, cvRound(GET_TIMING(s)), cvRound(GET_AVERAGE_TIMING(s)), GET_TIMING_COUNT(s) )

Leave a Reply

Your email address will not be published. Required fields are marked *