Question

I m looking for the fastest and more efficient method of detecting an object in a moving video. Things to note about this video: It is very grainy and low resolution, also both the background and foreground are moving simultaneously.

Note: I m trying to detect a moving truck on a road in a moving video.

Methods I ve tried:

Training a Haar Cascade - I ve attempted training the classifiers to identify the object by taking copping multiple images of the desired object. This proved to produce either many false detects or no detects at all (the object desired was never detected). I used about 100 positive images and 4000 negatives.

SIFT and SURF Keypoints - When attempting to use either of these methods which is based on features, I discovered that the object I wanted to detect was too low in resolution, so there were not enough features to match to make an accurate detection. (Object desired was never detected)

Template Matching - This is probably the best method I ve tried. It s the most accurate although the most hacky of them all. I can detect the object for one specific video using a template cropped from the video. However, there is no guaranteed accuracy because all that is known is the best match for each frame, no analysis is done on the percentage template matches the frame. Basically, it only works if the object is always in the video, otherwise it will create a false detect.

So those are the big 3 methods I ve tried and all have failed. What would work best is something like template matching but with scale and rotation invariance (which led me to try SIFT/SURF), but i have no idea how to modify the template matching function.

Does anyone have any suggestions how to best accomplish this task?

Answer 1

Apply optical flow to the image and then segment it based on flow field. Background flow is very different from "object" flow (which mainly diverges or converges depending on whether it is moving towards or away from you, with some lateral component also).

Here s an oldish project which worked this way:

http://users.fmrib.ox.ac.uk/~steve/asset/index.html

Answer 2

This vehicle detection paper uses a Gabor filter bank for low level detection and then uses the response to create the features space where it trains an SVM classifier.

The technique seems to work well and is at least scale invariant. I am not sure about rotation though.

Answer 3

Not knowing your application, my initial impression is normalized cross-correlation, especially since I remember seeing a purely optical cross-correlator that had vehicle-tracking as the example application. (Tracking a vehicle as it passes using only optical components and an image of the side of the vehicle - I wish I could find the link.) This is similar (if not identical) to "template matching", which you say kind of works, but this won t work if the images are rotated, as you know.

However, there s a related method based on log-polar coordinates that will work regardless of rotation, scale, shear, and translation.

I imagine this would also enable tracking that the object has left the scene of the video, too, since the maximum correlation will decrease.

Answer 4

How low resolution are we talking? Could you also elaborate on the object? Is it a specific color? Does it have a pattern? The answers affect what you should be using.

Also, I might be reading your template matching statement wrong, but it sounds like you are overtraining it (by testing on the same video you extracted the object from??).

Answer 5

A Haar Cascade is going to require significant training data on your part, and will be poor for any adjustments in orientation.

Your best bet might be to combine template matching with an algorithm similar to camshift in opencv (5,7MB PDF), along with a probabilistic model (you ll have to figure this one out) of whether the truck is still in the image.

友情链接