Norm, distance, angle • norm • distance • angle • hyperplanes • complex vectors Euclidean norm (Euclidean) norm of vector a ∈ Rn: kak = q √ = a21 + a22 + · · · + a2n aT a • if n = 1, kak reduces to absolute value |a| • measures the magnitude of a • sometimes written as kak2 to distinguish from other norms, e.g., kak1 = |a1| + |a2| + · · · + |an| Properties Nonnegative definiteness kak ≥ 0 for all a, kak = 0 only if a = 0 Homogeneity kβak = |β|kak for all vectors a and scalars β Triangle inequality ka + bk ≤ kak + kbk for all vectors a and b of equal length (proof on page 2-7) Cauchy-Schwarz inequality |aT b| ≤ kakkbk for all a, b ∈ Rn moreover, equality |aT b| = kakkbk holds if: • a = 0 or b = 0; in this case aT b = 0 = kakkbk • a 6= 0 and b 6= 0, and b = γa for some γ > 0; in this case 0 < aT b = γkak2 = kakkbk • a 6= 0 and b 6= 0, and b = −γa for some γ > 0; in this case 0 > aT b = −γkak2 = −kakkbk Proof of Cauchy-Schwarz inequality 1. trivial if a = 0 or b = 0 2. assume kak = kbk = 1; we show that −1 ≤ aT b ≤ 1 0 ≤ ka − bk2 0 ≤ ka + bk2 = (a − b)T (a − b) = (a + b)T (a + b) = kak2 − 2aT b + kbk2 = kak2 + 2aT b + kbk2 = 2(1 − aT b) = 2(1 + aT b) with equality only if a = b with equality only if a = −b 3. for general nonzero a, b, apply case 2 to the unit-norm vectors 1 a, kak 1 b kbk RMS value let a be a real n-vector • the average of the entries of a is a1 + a2 + · · · + an 1T a avg(a) = = n n • the root-mean-square value is the root of the average squared entry r rms(a) = a21 + a22 + · · · + a2n kak =√ n n Exercise • show that | avg(a)| ≤ rms(a) • show that avg(b) ≤ rms(a) where b = (|a1|, |a2|, . . . , |an|) Triangle inequality from CS inequality for vectors a, b of equal length ka + bk2 = (a + b)T (a + b) = aT a + bT a + aT b + bT b = kak2 + 2aT b + kbk2 ≤ kak2 + 2kakkbk + kbk2 (by Cauchy-Schwarz) = (kak + kbk)2 • taking squareroots gives the triangle inequality • triangle inequality is an equality if and only if aT b = kakkbk (see p. 2-4) • also note from line 3 that ka + bk2 = kak2 + kbk2 if aT b = 0 Outline • norm • distance • angle • hyperplanes • complex vectors Distance the (Euclidean) distance between vectors a and b is defined as ka − bk • ka − bk ≥ 0 for all a, b; distance is equal to zero only if a = b • triangle inequality ka − ck ≤ ka − bk + kb − ck for all a, b, c c ka − ck kb − ck a ka − bk b ka − bk • RMS deviation between n-vectors a and b is rms(a − b) = √ n Standard deviation let a be a real n-vector • the de-meaned vector is the vector of deviations from the average T a1 − (1 a)/n a1 − avg(a) a2 − avg(a) a2 − (1T a)/n = a − avg(a)1 = .. .. an − avg(a) an − (1T a)/n • the standard deviation is the RMS deviation from the average a − ((1T a)/n)1 √ std(a) = rms(a − avg(a)1) = n • the de-meaned vector in standard units is 1 (a − avg(a)1) std(a) Exercise show that avg(a)2 + std(a)2 = rms(a)2 Solution 2 std(a)2 = = = = ka − avg(a)1k n T T T 1 1 a 1 a a− 1 a− 1 n n n (1T a)2 (1T a)2 a a− − + n n T 2 1 (1 a) aT a − n n 1 n T ! 2 1T a n n = rms(a)2 − avg(a)2 Exercise: nearest scalar multiple given two vectors a, b ∈ Rn, with a 6= 0, find scalar multiple ta closes to b b a ta Solution • squared distance between ta and b is kta − bk2 = (ta − b)T (ta − b) = t2aT a − 2taT b + bT b a quadratic function of t with positive leading coefficient aT a • derivative with respect to t is zero for aT b aT b t= T = a a kak2 Exercise: mean of set of points given N vectors x1, . . . , xN ∈ Rn, find the n-vector z that minimizes kz − x1k2 + kz − x2k2 + · · · + kz − xN k2 x4 x3 x5 z x2 x1 z is also known as the centroid of the points x1, . . . , xN Solution: sum of squared distances is kz − x1k2 + kz − x2k2 + · · · kz − xN k2 n X 2 2 2 = (zi − x1i) + (zi − x2i) + · · · + (zi − xN i) = i=1 n X N zi2 − 2zi(x1i + x2i + · · · + xN i) + x21i + ··· + x2N i i=1 (here xji is ith element of vector xj ) • term i in the sum is minimized by x1i + x2i + · · · + xN i zi = N • solution z is componentwise average of the points x1, . . . , xN : 1 z = (x1 + x2 + · · · + xN ) N K-means clustering a very popular iterative algorithm for partitioning N vectors in K clusters Algorithm choose initial 'representatives' z1, . . . , zK for the K clusters and repeat: 1. assign each vector xi to the nearest representative zj 2. replace each representative zj by the mean of the vectors assigned to it • can be shown to converge in a finite number of iterations • initial representatives are often chosen randomly • solution depends on choice of initial representatives • in practice, often restarted a few times, with different starting points Example: first iteration assignment to clusters updated representatives Example: iteration 2 assignment to clusters updated representatives Example: iteration 3 assignment to clusters updated representatives Example: iteration 9 assignment to clusters updated representatives Example: iteration 10 assignment to clusters updated representatives Example: iteration 11 assignment to clusters updated representatives Example: iteration 12 assignment to clusters updated representatives Outline • norm • distance • angle • hyperplanes • complex vectors Angle between vectors the angle between nonzero real vectors a, b is defined as T a b arccos kak kbk • this is the unique value of θ ∈ [0, π] that satisfies aT b = kakkbk cos θ b θ a • Cauchy-Schwarz inequality guarantees that aT b −1 ≤ ≤1 kak kbk Terminology θ=0 aT b = kakkbk vectors are aligned or parallel 0 ≤ θ < π/2 aT b > 0 vectors make an acute angle θ = π/2 aT b = 0 vectors are orthogonal (a ⊥ b) π/2 < θ ≤ π aT b < 0 vectors make an obtuse angle θ=π aT b = −kakkbk vectors are anti-aligned or opposed Orthogonal decomposition given a nonzero a ∈ Rn, every n-vector x can be decomposed as x = ta + y y with y ⊥ a x aT x t= , 2 kak ta aT x y =x− a 2 kak a • proof is by inspection • decomposition (i.e., t, y) exists and is unique for every x • ta is projection of x on line through a (see page 2-11) • since y ⊥ a, we have kxk2 = ktak2 + kyk2 Correlation coefficient the correlation coefficient between non-constant vectors a, b is a ˜T ˜b ρ= k˜ ak k˜bk where a ˜ = a − avg(a)1 and ˜b = b − avg(b)1 are the de-meaned vectors • only defined when a and b are not constant (˜ a 6= 0 and ˜b 6= 0) • ρ is the cosine of the angle between the de-meaned vectors • ρ is the average product of deviations from the mean in standard units n 1 X (ai − avg(a)) (bi − avg(b)) ρ= n i=1 std(a) std(b) Examples ak bk bk ρ = 0.97 k ak ak k bk bk ρ = −0.99 k ak ak k bk bk ρ = 0.004 k k ak Regression line • scatterplot shows two n-vectors a, b as n points (ak , bk ) • straight line shows affine function f (x) = c1 + c2x with f (ak ) ≈ bk , k = 1, . . . , n Least-squares regression use coefficients c1, c2 that minimize J = n X (f (ak ) − bk ) 2 k=1 • J is a quadratic function of c1 and c2: J n X = 2 (c1 + c2ak − bk ) k=1 = nc21 + 2(1T a)c1c2 + kak2c22 − 2(1T b)c1 − 2(aT b)c2 + kbk2 • to minimize J, set derivatives with respect to c1, c2 to zero: nc1 + (1T a)c2 = 1T b, (1T a)c1 + kak2c2 = aT b • solution is aT b − (1T a)(1T b)/n , c2 = kak2 − (1T a)2/n 1T b − (1T a)c2 c1 = n Interpretation slope c2 can be written in terms of correlation coefficient ρ of a and b: std(b) (a − avg(a)1)T (b − avg(b)1) = ρ c2 = ka − avg(a)1k2 std(a) offset c1 = avg(b) − avg(a)c2 • hence, expression for regression line can be written as f (x) = avg(b) + ρ std(b) (x − avg(a)) std(a) • correlation coefficient ρ is the slope after converting to standard units: f (x) − avg(b) x − avg(a) =ρ std(b) std(a) Examples ρ = 0.91 ρ = −0.89 ρ = 0.25 • dashed lines in top row show average ± standard deviation • bottom row shows scatterplots of top row in standard units Outline • norm • distance • angle • hyperplanes • complex vectors Hyperplane one linear equation in n variables x1, x2, . . . , xn: a1x1 + a2x2 + · · · + anxn = b in vector notation: aT x = b let H be the set of solutions: H = {x ∈ Rn | aT x = b} • H is empty if a1 = a2 = · · · = an = 0 and b 6= 0 • H = Rn if a1 = a2 = · · · = an = 0 and b = 0 • H is called a hyperplane if a = (a1, a2, . . . , an) 6= 0 • for n = 2, a straight line in a plane; for n = 3, a plane in 3-D space, . . . Example b = −5 x2 b = −10 b = 15 a = (2, 1) x1 b = −15 b = 10 b=0 b=5 Geometric interpretation of hyperplane • recall formula for orthogonal decomposition of x w.r.t. a (page 2-25): aT x x= a+y 2 kak with y ⊥ a H y T x • x satisfies a x = b if and only if b a+y x= 2 kak with y ⊥ a 2 a (b/kak )a • point (b/kak2)a is the intersection of hyperplane with line through a • add arbitrary vectors y ⊥ a to get all other points in hyperplane Exercise: projection on hyperplane • show that the point in H = {x | aT x = b} closest to c ∈ Rn is aT c − b a x ˜=c− kak2 |aT c − b| • kc − x ˜k = is the distance of c to the hyperplane kak2 c x ˜ a H Solution we need to find y in the decomposition b x ˜= a+y kak2 with y ⊥ a • decomposition of c with respect to a is aT c a+d c= 2 kak aT c with d = c − a 2 kak • squared distance between c and x ˜ is T 2 T 2
