Patch tracking based on comparing its pixels1 Tomáš Svoboda, svoboda@cmp.felk.cvut.cz Czech Technical University in Prague, Center for Machine Perception http://cmp.felk.cvut.cz Last update: March 23, 2015 comparing patch pixels normalized cross-correlation, ssd . . . KLT - gradient based optimization Talk Outline good features to track Please note that the lecture will be accompanied be several sketches and derivations on the blackboard and few live-interactive demos in Matlab 1 What is the problem? 2/38 video: CTU campus, door of G building Tracking of dense sequences — camera motion T - Template I - Image Scene static, camera moves. 3/38 Tracking of dense sequences — object motion T - Template 4/38 I - Image Camera static, object moves. Alignment of an image (patch) 5/38 Goal is to align a template image T (x) to an input image I(x). x column vector containing image coordinates [x, y]>. The I(x) could be also a small subwindow within an image. How to measure the alignment? 6/38 What is the best criterial function? How to find the best match, in other words, how to find extremum of the criterial function? Criterial function convex (remember the optimization course?) discriminative What are the desired properties (on a certain domain)? ... Normalized cross-correlation 7/38 You may know it as correlation coefficient (from statistics) ρX,Y = cov(X, Y ) E[(X − µX )(Y − µY )] = σX σY σX σY where σ means standard deviation. Having template T (k, l) and image I(x, y), T (k, l) − T I(x + k, y + l) − I(x, y) r r(x, y) = q 2 2 P P P P T I(x, y) T (k, l) − I(x + k, y + l) − k l k l P P k l Normalized cross-correlation – in picture 8/38 criterial function ncc 50 50 100 100 150 150 0.6 200 200 0.4 250 250 300 300 350 350 400 400 450 450 500 500 550 550 400 500 600 −0.2 −0.4 −0.6 100 700 200 300 0 well, definitely not convex 200 0.2 but the discriminability looks promising 100 0.8 very efficient in computation, see [3]2. 2 300 400 500 600 700 check also normxcorr2 in Matlab Sum of squared differences ssd (x, y) = XX k l 9/38 2 (T (k, l) − I(x + k, y + l)) criterial function ssd 50 50 100 100 150 150 200 200 250 250 300 300 350 350 400 400 450 450 500 500 550 550 7 x 10 4 3.5 3 2.5 100 200 300 400 500 600 700 2 1.5 1 0.5 100 200 300 400 500 600 700 0 Sum of absolute differences sad (x, y) = XX k l 10/38 |T (k, l) − I(x + k, y + l)| criterial function sad 5 x 10 2.2 50 50 100 100 1.8 150 150 1.6 200 200 250 250 300 300 350 350 400 400 450 450 500 500 550 550 2 1.4 1.2 1 0.8 0.6 0.4 0.2 100 200 300 400 500 600 100 700 200 300 400 500 600 700 SAD for the door part 11/38 criterial function sad 4 x 10 50 10 9 100 8 50 150 100 7 150 200 6 250 5 200 250 300 4 300 350 3 400 350 2 450 500 1 400 550 100 200 300 400 500 600 200 700 250 300 350 400 450 500 550 SAD for the door part – truncated 12/38 criterial function sad_truncated 50 8000 100 7000 150 6000 200 5000 250 4000 50 100 150 200 250 300 3000 300 350 400 2000 350 450 500 1000 400 550 100 200 300 400 500 600 700 200 250 300 350 400 450 500 Differences greater than 20 intensity levels are counted as 20. 550 0 Normalized cross-correlation: how it works 13/38 live demo for various patches Normalized cross-correlation: tracking 14/38 What went wrong? video Why did it failed? Suggestions for improvement? finding extrema of a criterial function . . . Tracking as an optimization problem . . . sounds like an optimization problem Iteratively minimizes sum of square differences. Kanade–Lucas–Tomasi (KLT) tracker It is a Gauss-Newton gradient algorithm. 15/38 Improved many times, most importantly by Carlo Tomasi [5, 6] Free implementation(s) available3. Also part of the OpenCV library4. Firstly published in 1981 as an image registration method [4]. Importance in Computer Vision 16/38 After more than two decades, a project5 at CMU dedicated to this single algorithm and results published in a premium journal [1]. Part of plethora computer vision algorithms. Our explanation follows mainly the paper [1]. It is a good reading for those who are also interested in alternative solutions. http://www.ces.clemson.edu/~stb/klt/ http://opencv.org/ 5 Lucas-Kanade 20 Years On https://www.ri.cmu.edu/research_project_detail.html?project_ id=515&menu_id=261 3 4 Original Lucas-Kanade algorithm I 17/38 Goal is to align a template image T (x) to an input image I(x). x column vector containing image coordinates [x, y]>. The I(x) could be also a small subwindow withing an image. Set of allowable warps W(x; p), where p is a vector of parameters. For translations x + p1 W(x; p) = y + p2 W(x; p) can be arbitrarily complex The best alignment, p∗, minimizes image dissimilarity X x [I(W(x; p)) − T (x)]2 Original Lucas-Kanade algorithm II X x 18/38 [I(W(x; p)) − T (x)]2 I(W(x; p) is nonlinear! The warp W(x; p) may be linear but the pixels value are, in general, non-linear. In fact, they are essentially unrelated to x. Linearization of the image: It is assumed that some p is known and best increment ∆p is sought. The modified problem X [I(W(x; p + ∆p)) − T (x)]2 x is solved with respect to ∆p. When found then p gets updated p ← p + ∆p ... Original Lucas-Kanade algorithm III X x 19/38 [I(W(x; p + ∆p)) − T (x)]2 linearization by performing first order Taylor expansion6 X [I(W(x; p)) + ∇I > x ∂W ∆p − T (x)]2 ∂p ∂I ∂I ∇I > = [ ∂x , ∂y ] is the gradient image computed at W(x; p). The term is the Jacobian of the warp. 6 ∂W ∂p Detailed explanation on the blackboard. Original Lucas-Kanade algorithm IV Differentiate P x [I(W(x; p)) + ∇I > ∂∂W p ∆p 20/38 − T (x)]2 with respect to ∆p > X > ∂W > ∂W 2 ∇I I(W(x; p)) + ∇I ∆p − T (x) ∂p ∂p x setting equality to zero yields −1 ∆p = H > X > ∂W [T (x) − I(W(x; p))] ∇I ∂p x where H is (Gauss-Newton) approximation of Hessian matrix. > X > ∂W > ∂W H= ∇I ∇I ∂p ∂p x The Lucas-Kanade algorithm—Summary 21/38 Iterate: 1. Warp I with W(x; p) 2. Warp the gradient ∇I > with W(x; p) 3. Evaluate the Jacobian image ∇I > ∂∂W p ∂W ∂p 4. Compute the H = ∇I > P h 5. Compute ∆p = H−1 x P h x at (x; p) and compute the steepest descent ∂W ∂p ∇I > i> h ∂W ∂p ∇I > i> 6. Update the parameters p ← p + ∆p until k∆pk ≤ ∂W ∂p i [T (x) − I(W(x; p))] Example of convergence 22/38 video Example of convergence 23/38 Convergence video: Initial state is within the basin of attraction Example of divergence Divergence video: Initial state is outside the basin of attraction 24/38 Example – on-line demo 25/38 Let play and see . . . What are good features (windows) to track? What are good features (windows) to track? 26/38 27/38 How to select good templates T (x) for image registration, object tracking. −1 ∆p = H > X > ∂W [T (x) − I(W(x; p))] ∇I ∂p x where H is the matrix > X > ∂W > ∂W H= ∇I ∇I ∂p ∂p x The stability of the iteration is mainly influenced by the inverse of Hessian. We can study its eigenvalues. Consequently, the criterion of a good feature window is min(λ1, λ2) > λmin (texturedness). What are good features for translations? Consider translation W(x; p) = 1 0 ∂W ∂p = 0 1 x + p1 . The Jacobian is then y + p2 28/38 P h i> h i > ∂W > ∂W ∇I ∇I x ∂ p" ∂p ∂I # P 1 0 1 0 ∂I ∂I ∂x [ = , ] ∂I x ∂x ∂x 0 1 0 1 ∂y 2 ∂I ∂I ∂I P ∂x ∂y 2 ∂x = x ∂I ∂I ∂I H = ∂x ∂y ∂y The image windows with varying derivatives in both directions. Homeogeneous areas are clearly not suitable. Texture oriented mostly in one direction only would cause instability for this translation. What are the good points for translations? The matrix H= X x ∂I 2 ∂x ∂I ∂I ∂x ∂y ∂I ∂I ∂x ∂y 2 ∂I ∂y 29/38 Should have large eigenvalues. We have seen the matrix already, where? Harris corner detector [2]! The matrix is sometimes called Harris matrix. Experiments - no occlusions video 30/38 Experiments - occlusions 31/38 video Experiments - occlusions with dissimilarity 32/38 video Experiments - object motion video 33/38 Experiments – door tracking 34/38 video Experiments – door tracking – smoothed 35/38 video Comparison of ncc vs KLT tracking video 36/38 References 37/38 [1] Simon Baker and Iain Matthews. Lucas-Kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2004. [2] C. Harris and M. Stephen. A combined corner and edge detection. In M. M. Matthews, editor, Proceedings of the 4th ALVEY vision conference, pages 147–151, University of Manchaster, England, September 1988. on-line copies available on the web. [3] J.P. Lewis. Fast template matching. In Vision Interfaces, pages 120–123, 1995. Extended version published on-line as "Fast Normalized Cross-Correlation" at http://scribblethink.org/Work/nvisionInterface/nip.html. [4] Bruce D. Lucas and Takeo Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Conference on Artificial Intelligence, pages 674–679, August 1981. [5] Jianbo Shi and Carlo Tomasi. Good features to track. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 593–600, 1994. [6] Carlo Tomasi and Takeo Kanade. Detection and tracking of point features. Technical Report CMU-CS-91-132, Carnegie Mellon University, April 1991. End 38/38
© Copyright 2024