High-speed object matching and localization using gradient orientation features

High-speed object matching and localization using gradient
orientation features
Xinyu Xu*, Peter van Beek, Xiaofan Feng
Sharp Laboratories of America, 5750 NW Pacific Rim Blvd, Camas, WA, USA 98607
ABSTRACT
In many robotics and automation applications, it is often required to detect a given object and determine its pose
(position and orientation) from input images with high speed, high robustness to photometric changes, and high pose
accuracy. We propose a new object matching method that improves efficiency over existing approaches by decomposing
orientation and position estimation into two cascade steps. In the first step, an initial position and orientation is found by
matching with Histogram of Oriented Gradients (HOG), reducing orientation search from 2D template matching to 1D
correlation matching. In the second step, a more precise orientation and position is computed by matching based on
Dominant Orientation Template (DOT), using robust edge orientation features. The cascade combination of the HOG
and DOT feature for high-speed and robust object matching is the key novelty of the proposed method. Experimental
evaluation was performed with real-world single-object and multi-object inspection datasets, using software
implementations on an Atom CPU platform. Our results show that the proposed method achieves significant speed
improvement compared to an already accelerated template matching method at comparable accuracy performance.
Keywords: Object matching, fast object search, fast template matching, histogram of oriented gradients, dominant
orientation template, coarse to fine search, factory inspection, machine vision, robotics
1. INTRODUCTION
In this paper, we propose a method for object detection and localization in 2-D images. In particular, given an object of
interest in a model image, the goal is to automatically detect the object in an input image, as well as to determine its pose
(e.g. position, orientation, and scale). In the input image, the object of interest may have undergone geometric transforms
(e.g. rotation, zoom) and photometric changes (e.g. brightness, contrast, blur, noise). The basic task of detecting the
presence of an object in the input image may be referred to as object detection. The problem also includes object
localization, referring to the accurate estimation of the objects position, orientation angle, and scaling factor with respect
to a reference. Systems for detection and localization may involve a search stage; hence the process may also be called
image search. This problem is important in applications such as automated manufacturing and inspection in factory
production lines, product order fulfillment in automated warehouses, and service robotics.
The canonical set-up is illustrated in Figure 1, along with a sample result of the method proposed in this paper.
Typically, the relevant characteristics of the object of interest may be extracted, modeled or learned in a prior analysis
stage, and are assumed known before the main image search stage is performed. This model extraction stage is
considered off-line, while the main image search stage is considered on-line. Multiple instances of the object of interest
may appear in the input image. A very important goal is to detect and localize multiple objects with very high speed, i.e.
short processing time per input image for the on-line search stage. Another important goal is that the system must be
capable of handling a wide variety of object types, either with distinguishing features such as sharp corners or significant
edges, or with few such features. Another important goal is for the system to be robust to non-essential changes in the
object’s appearance due to lighting or illumination changes, blur, noise, and other degradations due to imaging
imperfections. Moreover, the system needs a high degree of repeatability even under varying and noisy conditions.
Another goal is that the system can withstand (partial) occlusion of the object or missing object parts in the input image.
*xxu@sharplabs.com; phone 1 360 834-8766; www.sharplabs.com
Object Model
Extraction
Model Image
Object model /
template database
Off-line
On-line
Object pose
Object Search
Input Image
Figure 1. Overview of object detection and localization system.
There is a very wide body of literature on object detection and localization. One well-known class of approaches is based
on template matching. In this approach, the object model consists of one or more templates that are extracted from the
model image(s). The templates may include pixel-based features such as gray-level or color values, edge locations and
orientations, etc. Subsequently, the input image is searched for patterns that are similar to one or more of the templates.
This may involve a template matching or pattern matching stage in which the similarity between a template and a pattern
from the input image is computed. A major problem with template matching approaches based on exhaustive search is
that the processing time is too high.
Another major class of approaches is based on feature point matching, e.g. SIFT 7. This approach starts with detection of
key points (e.g. corner points). The local features of key points are captured using descriptors or a statistical model.
Correspondences between individual key points in the model and input image can be found either by matching feature
descriptors or using a classification approach. Subsequently, a global transform describing the objects position, rotation
and scale may be robustly determined based on the individual point correspondences. A problem with this class of
approaches is that it relies on the presence of stable key points, such as corner points, for which the location can be
reliably determined. Several types of man-made objects common in industrial inspection and service robotics
applications may not contain such stable key points.
Another class of approaches includes training-based or learning-based object detection methods9. These methods utilize
sets of training samples of the object of interest to build a statistical model. Subsequently, a statistical classification
approach is used in the on-line search stage. A problem with such methods is that a large number of training samples
may be required in order to learn the relevant object characteristics properly, and that the training or learning stage may
require significant time. In practical scenarios, new objects may have to be learned frequently, and limited example
images may be available.
In this paper, we propose to combine two successful object detection approaches, namely Dominant Orientation
Templates (DOT)3,4 and Histogram of Oriented Gradients (HOG) descriptors1. Both methods are based on the use of
local gradient features and local edge orientations as the main feature, which have shown to be highly robust, allowing
invariance to illumination variations, noise and blur. Both methods also include local pooling of feature responses over
small spatial cells or blocks, providing robustness to small changes in edge position or small object deformations. Our
overall framework builds specifically on the DOT concept3, based on template matching. Template matching considers
the global object and works very well for a large variety of object shapes and appearances, including objects with little
texture and few or no stable key points. It does not require extraction of precise contour points, which can be fragile in
practice. It does not require compilation of a large training set or a training stage.
To make template matching successful in practical applications, it is critical to use acceleration techniques to avoid
exhaustive search and to use robust features and similarity measures. A conventional approach for template matching is
to use Normalized Cross-Correlation (NCC) between a template and a sliding window in the input image 6. NCC is
robust to linear changes between the signal values in the model template and the input image, but not to non-linear signal
changes. A fast method based on a similarity measure that is robust to non-linear signal changes was proposed2. Methods
for fast template matching and high-speed object localization have been proposed recently5,8. General techniques for
speeding up object matching include coarse-to-fine search (using an image pyramid representation), transform-domain
processing (to compute cross-correlation), and the use of integral images.
In object search, multiple templates of the same object may need to be used that represent different views of a single
object. In the 2-D matching case, multiple templates can be pre-computed off-line corresponding to different global
orientation angles of a 2-D object. Hence, multiple templates need to be used during the on-line search stage. Significant
acceleration can be achieved by considering the relation between multiple templates that represent different views of the
object. In particular, we use a single Histogram of Oriented Gradients descriptor covering the entire 2-D object as a
rotation-invariant representation (up to a shift) of the object. This single descriptor can be used in a fast pre-search stage
that provides coarse estimates of the position and orientation of candidate objects in the input image. This fast pre-search
stage rules out many locations in the input image as well as ruling out feasible orientation angles for the object at
promising locations. This avoids scanning the entire image with multiple templates representing the different possible
global orientations of the object. In addition, 1D HOG descriptors can be matched very efficiently using 1D crosscorrelation.
The combination of HOG and DOT as described above serves as a coarse search stage for our overall method. Our
application requires accurate estimates of the object positions and orientation angles. In addition we require searching for
multiple object matches. Hence, we utilize additional DOT matching stages for position and orientation refinement.
During each stage a list of candidate object matches is maintained; candidate matches are added to this list during the
initial search stages, and candidates may be removed or merged after subsequent stages.
In section 2, we describe the proposed object matching and localization method in more detail. In section 3, we provide
experimental results demonstrating the high detection performance and robustness of the proposed method, as well as the
high pose estimation accuracy, and high speed. We conclude in section 4.
2. PROPOSED OBJECT MATCHING AND LOCALIZATION METHOD
2.1 Overview
The algorithm flow of the proposed method is shown in Figure 2. Firstly, we discuss offline model image processing;
and secondly, online input image processing.
In the offline model image processing phase, we extract both the Histogram of Oriented Gradient (HOG)1 descriptor and
Dominant Orientation Templates (DOT) 3 descriptors from the model ROI region. The HOG descriptor is computed as a
single 1-dimensional histogram of the gradient orientations for the model ROI region, and it is used to find out the input
object orientation by matching the model HOG descriptor with the HOG descriptor of a particular area in the input
image in a sliding window manner. The HOG descriptor has a few key advantages over other descriptors for fast and
robust object matching. The first advantage is that we can quickly find a coarse object orientation with HOG matching
since the global object orientation change within 2D image plane is manifested as a shift in the index of local orientation
histogram binning. This eliminates the need of rotating the model ROI template many times and matching each of them
to the input (as done by the classical template matching), leading to great time saving. The second advantage is that,
since the HOG descriptor operates on localized cells, it is more robust against geometric (except rotation) and
photometric changes including position and scale variations, lighting or illumination change, blur and noise. However,
HOG has some limitations in matching. First, the position estimated by HOG is not accurate because the HOG matching
scores are not sufficiently discriminative; second, HOG cannot distinguish an object from the same object when it is
180° rotated since their HOG descriptor is the same. To overcome these limitations, we employ matching with DOT.
During offline model image processing, we compute DOT descriptors for each rotated model ROI at fine angle
resolution. These DOTs will be used to find out the object position at fine precision. DOT is chosen because it is fast to
compute, compact in memory, resilient to photometric transformations (blur, noise, low contrast), robust against
occlusion and can deal with un-textured objects.
Figure 2. Algorithm flow of the proposed method.
In the online input image processing phase, the orientation and position of object(s) are determined via three stages. In
the first step of the coarse search, we compute a coarse orientation map by matching model HOG with input HOG in a
sliding window manner. This step also yields a score map where local peaks correspond to candidate object positions.
We then compute an update score map by matching input DO of each sliding window with the model DOT of a
particular orientation that is given by the coarse orientation map. We then retain the best matches with highest scores by
finding the local peaks in the score map and remove duplicate/close matches whose orientation and position are similar
to each other. The coarse search provides initial candidate locations in low/block resolution and rough angular
orientation. In the second middle search stage, the algorithm then refines the spatial position and the angular orientation
by matching model low/block resolution DOT of different orientations with input low/block resolution DO image in a
small spatial neighborhood. In this step we also resolve 180° orientation ambiguity using an additional round of search in
a small neighborhood around an angle that is offset by 180 degrees from the previously refined angle. Next, we retain
the best matches resulted from “middle search” and remove duplicate/close matches. In the last fine search stage, we
perform further refinement to get pixel/sub-pixel position estimate and 1-degree/sub-degree angle estimate using the full
resolution DOT plus sub-pixel and sub-angle interpolations.
The cascade estimation from coarse to fine and the complementary use of HOG and DOT result in a very efficient and
very accurate object matching method. The HOG descriptor reduces orientation search from with 2D template (as done
by classical template matching) to 1D correlation matching, and only one HOG feature vector is needed to find out the
rough orientation (e.g. 10° resolution) of the target object, eliminating the necessity of rotating model template many
times and matching each of them to the input, which leads to great time saving. In the middle search stage, both the
orientation and position search takes places in a small angular and spatial neighborhood around the orientation and
position that are previously found by the coarse search, which greatly reduces the search space. The fine search stage
improves angular and position estimation precision by matching in pixel/sub-pixel position resolution and 1-degree/subdegree orientation resolution, leading to more precise localization. In addition, we employ HOG and DOT feature in
complementary to each other to compensate their shortcomings: HOG is fast in arriving at object orientation but not
precise in the position estimation, DOT gives good position and orientation estimate but is slow as it requires large
number of matchings by scanning the whole image and rotating the template at desired angular resolution.
2.2 Feature extraction
We describe the extraction of HOG and DO feature in this section. In the subsequent sections, we will use model DOT to
refer to the dominant orientation feature image computed for model image, and will use input DO to refer to the
dominant orientation feature image computed for input image.
HOG and DO share similar processing in the first few steps which include image pre-processing (smoothing, downsampling), computing horizontal and vertical gradient at edges and computing orientation angle at edges. In our method,
the range of orientation angle in both HOG and DO is 0°~180° (not 0°~360°) as we find the former case is more
invariant to contrast inversions.
Figure 3. Compute HOG descriptor for a ROI.
Figure 4. Computation of Dominant Orientation feature.
After these, for HOG, we select the orientation with largest gradient magnitude, i.e. dominant orientation, in a 4x4 block
and quantize it into discrete index. Multiple dominant orientations can also be combined and quantized into one index.
The angle quantization interval determines the HOG orientation estimation precision. The smaller the quantization
interval the more precise the estimated orientation, but then HOG 1D correlation matching will take longer time. In our
application we quantize the angle into 10° precision, which results in good balance between speed and accuracy in the
coarse search stage. Finally we compute the histogram of the orientation indices for the 4x4 blocks encapsulated by the
ROI inconsideration. This results in an 18-dimensional histogram vector for a ROI. Figure 3 shows the process of
computing HOG descriptor for a ROI region.
For DO, we retain the 2D dominant orientation image as opposed to collapsing them into one 1D histogram vector. The
orientation angle at each pixel is encoded into a byte, and the angle is quantized into 6 levels, each orientation is
indicated by setting “1” to the corresponding bit. To improve matching efficiency, we combine the dominant orientations
within a 4x4 block into one byte, each bit corresponding to an angular orientation. What’s more, to reduce the effect of
compared angles being on opposing side of quantized angular boundaries, we allow the byte code of neighboring
orientation to have overlap. This measure improves the robustness of DO to noise and other small changes between input
and model image. More details can also be found in Ref. 3. We compute two types of DO: the first type of DO is in
low/block resolution, and the second type is in normal/pixel resolution. For model image, we compute the DOs of model
ROI rotated at every 1° for both types and store them as templates for future matching. The first type of model DOTs
will be used during coarse and middle search stage, the second type of DOTs will be used during fine search stage.
Figure 4 shows the process of computing DOT descriptor.
2.3 Offline model image processing
During offline model image processing, we compute the HOG of the maximum region of all rotated model ROIs. The
purpose of using the maximum region of all rotated ROIs is to guarantee the object rotated at all possible angles are
encapsulated by this ROI so that every edge pixel is counted toward the histogram computation. This HOG descriptor is
represented as a 1D vector. We then compute the DOT for each rotated model ROI at every 1°. These DOTs are
computed at both block resolution and pixel resolution, as described in section 2.2.
2.4 Coarse search
Once the system receives an input image, the method computes the pixel and block resolution orientation index image
where each pixel in this image corresponds to the orientation index (18 levels) in the HOG descriptor, and compute the
pixel and block resolution DO image where the orientation is quantized into 6 levels. We perform three cascade steps of
coarse to fine search to localize the object pose. In this section we devoted to the coarse search stage.
Figure 5. Finding coarse object orientation by locating the angle that maximizes the cross correlation score between model
HOG and input patch HOG.
The coarse search stage is divided two steps. In the first step, we find the coarse orientation of object(s) with HOG
matching. For each local input patch [x y wm hm] (wm and hm are the width and height of the maximum model ROI), we
compute its HOG at low/block resolution, we then match it with the model low/block resolution HOG using normalized
cross correlation (other histogram comparison metric such as, Chi-square, histogram intersection, Bhattacharyya
distance10 or Kullback–Leibler divergence11 can also be used). The orientation that yields the highest correlation score is
deemed as the orientation of the object in the local input patch. When computing the HOG of local input patch, in order
to prevent the edge pixels of the neighborhood region from being included in computing the histogram of current patch,
we apply a disk mask to mask out those neighborhood edge pixels. The radius of the disk mask is computed as the half
of the maximum width and height of the model ROI. Figure 5 graphically illustrate coarse search step. This step yields a
coarse orientation map where each pixel represents the coarse orientation of the object (i.e. 10° resolution) encapsulated
in the local input patch, and a score map where local peaks in the score map correspond to the estimated object position
at low/block resolution.
However, it is challenging to reliably estimate object positions based on the HOG matching score map because first the
HOG score map is not sparse and thus is not sufficiently discriminative to indicate true object positions; second, high
scores tend to appear in a large continuous region; last, in crowded area, the cross correlation score tends to be higher
than non-crowded area, even if the non-crowded area could be a more accurate position. In order to reliably estimate
object positions, in the second step of coarse search, we employ DO feature which is more discriminative since it uses
the whole 2D template for matching.
Given the model DOT rotate by every n-th degree (n is given by the angle quantization interval in HOG), the input DO
image, the coarse orientation map and the score map estimated by the first step of coarse search, we refine object
position as follows. For each pixel [x y] in the input DO image, we first extract the coarse orientation from the
orientation map at [x y], we then get the corresponding model DOT of that particular orientation. Next the model DOT of
that particular orientation is matched with the input local DO by per-pixel byte AND. We then count the number of nonzero bytes as the DO matching score. This results in an updated score map. Since both the model DOT and input DO are
still in low/block resolution, the local peaks in the updated score map correspond to object positions in low/block
resolution.
2.5 Coarse orientation search with optimized HOG computation
The complexity of computing histogram at each input patch is O(r) where r is the half length of the longest side of the
patch. This complexity is found to be slow and it hampers the HOG matching approach from being used in real-time
applications. In this paper, a new, simple yet much faster histogram computation algorithm is proposed. The proposed
algorithm maintains one histogram for each column in the image. This set of histograms is preserved across rows for the
entirety of the process. Each column histogram accumulates 2r + 1 adjacent pixels and is initially centered on the first
row of the image. The kernel histogram (i.e. the histogram of the sliding window) is computed by summing 2r + 1
adjacent column histograms. What we have done is to break up the kernel histogram into the union of its columns, each
of which maintains its own histogram. While computing the HOG for the entire input image, all histograms can be kept
up to date in constant time with a two-step approach.
(a)
(b)
Figure 6. The two steps of the fast histogram computation algorithm.
Consider the case of moving to the right from one pixel to the next, as shown in Figure 6. The column histograms to the
right of the kernel are yet to be processed for the current row, so they are centered one row above. The first step consists
of updating the column histogram to the right of the kernel by subtracting its topmost pixel and adding one new pixel
below it (Figure 6a). The effect of this is lowering the column histogram by one row. The initialization consists of
accumulating the first r rows in the column histograms and computing the kernel histogram from the first r column
histograms. The coarse orientation estimation algorithm with optimized histogram computation is listed in pseudo code
in Figure 7.
Figure 7. Coarse orientation search with fast histogram computation.
2.6 Middle search
Coarse search stage uses large angle quantization interval in both HOG (10°) and DO (30°) feature for improved
matching efficiency, as a result the estimated orientation is in rough angular resolution. We therefore perform a middle
search to refine the object orientation to finer angular resolution.
Middle search is performed in low/block resolution. At the start, the input orientation is one of the coarse rotation angles
(e.g. 10° resolution). At the end, the output orientation is any of the (fine) rotation angles (e.g. 2° resolution). The
positions are always at the low/block resolution level for both input and output positions. The low/block resolution
model DOT at different orientations around the previously found coarse orientation are matched to the low/block
resolution input DO image to find the optimal fine angle. At each rotation angle, there is also a local spatial search
across positions around the previous match position.
Figure 8. Method for resolving HOG orientation estimation ambiguity during middle search.
During middle search, we also resolve the ambiguity that HOG cannot distinguish an object and the same object that is
rotated by180°. First, an initial, coarse orientation (whose range is from 0 to 180°) is identified by the coarse search with
HOG (section 2.4). Next, this initial orientation is refined during middle search stage by matching with DO feature in a
small angular and spatial neighborhood. Then, the middle search is repeated by searching in a small angular
neighborhood around an orientation that is offset by 180 degrees from the previously refined angle with a small spatial
search. The best orientation from the two middle search stages is chosen as the correct object orientation, which may
subsequently be refined further in the find search stage. This strategy resolves orientation estimation ambiguity and
greatly improves matching efficiency since the two round of middle search are performed in a small spatial and angular
neighborhood. Figure 8 illustrates this process.
2.7 Fine search
The position estimated by coarse and middle search is in low/block resolution since low/block resolution HOG and DO
feature is used. It is hence necessary to perform fine search to get pixel/sub-pixel level position estimates. Meanwhile,
we also refine orientation precision from degree to sub-degree resolution during fine search.
This refinement stage is performed at the normal/pixel resolution. Both orientation and position are refined by local
search. At the start, the input position is on the low/block resolution grid. At the end, the output position is on the normal
resolution grid. The orientation angle is also refined by a search around the previously estimated orientation angle after
middle search. For each angle, first, the pixel resolution model DOT at 0° is rotated to the proper (candidate) angle
before searching the different positions. Then, this pixel resolution model DOT at the correct rotation angle is matched
with the input DO image at different positions to yield the optimal position. All matching scores across angles and
positions are stored. These can be used later for fractional rotation angle and position estimation if necessary.
3. EXPERIMENTS AND RESULTS
3.1 Datasets and experiment platform
To test the accuracy and speed of the proposed system, we collected image datasets suitable for industrial inspection
applications, including images of automobile, metal, food, pharmaceutical, packaging and electronics products. These
datasets were captured using a SHARP factory inspection camera. A wide variety of different objects are captured in
these datasets including transistors, springs, screws, coins, IC chips, metal parts from different industries. The objects
exhibit large amount of geometric changes (e.g. in-plane rotation) and photometric changes (blur, noise, contrast, object
defects, complex background), making this dataset very challenging. A few sample input images with different target
objects are shown in Figure 9. We evaluated the proposed method in terms of speed, accuracy and robustness.
Figure 9. Sample input images in the testing dataset. The small image is the model image with target object in it.
In terms of accuracy evaluation, we benchmarked the proposed method with a baseline template matching method on a
PC platform. The baseline template matching method uses DOT as the main feature and employs a similar coarse-to-fine
search architecture as the proposed method. The principle difference of these two methods lie in that the proposed
method employs HOG feature to compute a coarse orientation during the coarse search stage whereas the baseline
template matching method uses DOT for both orientation and position estimation. In other words, the baseline method
does not use HOG to accelerate the coarse search. Figure 10 shows the side-by-side object localization results obtained
by the proposed method (left panel) and the baseline template matching method (right panel) for the screw, spring and
transistor datasets. Both method exhibits high detection accuracy and repeatability in these challenging multi-objects
datasets, and their detection accuracy are comparable.
Figure 10. Detection accuracy benchmark comparison between proposed method (left) and baseline method (right) on screw
(top), spring (middle) and transistor (bottom) dataset.
In terms of robustness evaluation, we tested the proposed method on a wide variety of datasets with large range of inplane rotation changes (with no/small scale change) and photometric variations including blur, noise, contrast, object
defect and complex background. Figure 11 (a) to (f) shows the evaluation result under these different conditions. In
figure 11 (a) to (f), the top row shows the input images, and the bottom row shows the result image obtained by the
proposed method. The proposed method is shown to be fairly robust to geometric changes and photometric changes,
which makes it quite applicable to real-world industrial inspection applications.
(a) Multi-object
(b) Defective object
(d) Object with decreasing contrast
(c) Complex background
(e) Object with increasing noise level.
(f) Object with increasing blur level.
Figure 11. Robustness evaluation result of the proposed method.
In terms of speed evaluation, we first benchmarked our implementation of the proposed approach with our
implementation of the baseline template matching method. Both methods are implemented in C/C++, and use the Intel
Integrated Performance Primitives (IPP) library for software acceleration. The timing performance is measured on a
netbook PC with an Intel Atom CPU, for both implementations. The processing time results are shown in the left and
middle columns of Table 1. The image size of the spring, transistor and screw are 512x480, 1600x1200, and 1600x1200
respectively, and the model image size for these three objects are 61x91, 66x139 and 66x139 respectively. The proposed
technique achieved very large reduction in processing time in case of the transistor and screw datasets. The proposed
technique did not achieve a reduction in overall processing time in case of the spring dataset. The reason is that in this
case, the implementation retained a much larger number of candidate objects after the coarse search stage, that were used
for refinement during the later search stages. For the spring dataset, even though the proposed technique was able to
speed up the coarse search, the reduction in processing time was not able to make up for the increase in processing time
that resulted from the refinement stages.
Additional benchmarking was performed by measuring the processing time of an optimized and embedded
implementation of the baseline template matching method. This implementation runs on the target embedded
inspection camera platform, with the same Atom CPU. The processing time results are shown in the right column of
Table 1. The results provide additional evidence of the speed-up provided by the proposed combination of HOG and
DOT.
Table 1. Speed benchmark evaluation. The columns show processing time results for: (left) our implementation of the
proposed (HOG+DOT) approach with timing measured on a netbook PC platform; (middle) our implementation of the
baseline (DOT) approach with timing measured on a netbook PC platform; (right) an embedded and optimized
implementation with timing measured on the inspection camera platform.
4. CONCLUSION
In this paper, we proposed a new template matching method that improves efficiency over existing approaches by
decomposing orientation and position estimation into two cascade steps. Our results show that the proposed method
achieves significant speed improvement at comparable accuracy performance. And the proposed method achieves high
robustness to photometric changes (contrast, background illumination, blur, noise and object defect) and geometric
changes (rotation, translation and small scale change).
REFERENCES
[1] Dalal, N, Triggs B., “Histograms of oriented gradients for human detection,” Proc. IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), vol. 1, 886-893 (2005).
[2] Hel-Or, Y., Hel-Or, H., and David, E., “Fast template matching in non-linear tone-mapped images,” Proc. 13th
IEEE International Conference on Computer Vision (ICCV), 1355-1362 (2011).
[3] Hinterstoisser, S., Lepetit, V., Ilic, S., Fua, P., and Navab N., “Dominant orientation templates for real-time
detection of texture-less objects,” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2257- 2264 (2010).
[4] Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., and Lepetit, V., “Gradient response maps for
real-time detection of texture-less Objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 34, no.
5, 876-888 (2012).
[5] Lampert, C. H., Blaschko, M. B., and Hofmann, T., “Beyond sliding windows: Object localization by Efficient
Subwindow Search,” Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-8 (2008).
[6] Lewis, J. P., “Fast template matching,” Vision Interface 95, Canadian Image Processing and Pattern Recognition
Society, 120-123 (1995).
[7] Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer
Vision, 60, 2, pp. 91-110 (2004).
[8] Sibiryakov, A., “Fast and high-performance template matching method,” Proc. IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 1417-1424 (2011).
[9] Viola, P., and Jones, M., “Robust Real-time Object Detection,” Int. Journal of Computer Vision, vol. 57 (2), 137154 (2001).
[10] Bhattacharyya, A., “On a measure of divergence between two statistical populations defined by their probability
distributions,” Bulletin of the Calcutta Mathematical Society, 35: 99-109 (1943).
[11] Kullback, S.; Leibler, R.A., “On Information and Sufficiency,” Annals of Mathematical Statistics 22 (1): 79-86
(1951).