3/14/2012 OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 2011 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr 4/10/2011 Lecture 3 1 What is a random variable • Random Variables – A random variable X is not a “variable” like algebra – A random variable X is a function: • From a set of outcomes of a random event (the sample space S of an experiment) • To the set of real numbers • Realizations of a random variable are called random variates. Random V i bl Variable Set off outcomes S of a coin toss X (ξ ) ℝ 1 heads tails -1 4/10/2011 Lecture 3 2 1 3/14/2012 Example • Experiment: throw 3 coins • Sample Space: S = {(H,H,H), (H,H,T), (H,T,H), (T,H,H), (H, T, T), (T,H,T), (T,T,H),(T,T,T)} • Y is a random variable, giving the number of heads that landed: 1 P (Y = 0 ) = (H,H,H) 8 3 P (Y = 1) = 3 (H,H,T) (H,T,H) 8 (T H H) (T,H,H) 3 2 P (Y = 2 ) = 8 (T,H,T) 1 (T,T,H) 1 (H,T,T) P (Y = 3) = 0 8 (T,T,T) 4/10/2011 Lecture 3 3 Three balls are to be randomly selected without replacement from an urn containing 20 balls numbered 1 through 20. If we bet that at least one of the balls that are drawn has a number as large as or larger than 17, what is the probability that we win the bet? Let X be the largest of the three numbers drawn Let X be the largest of the three numbers drawn. ⎛ i − 1⎞ ⎜ 2 ⎟ ⎠ P { X = i} = ⎝ ⎛ 20 ⎞ ⎜3⎟ ⎝ ⎠ i = 3, 4,..., 20 P { X ≥ 17} = P { X = 17} + P { X = 18} + P { X = 19} + P { X = 20} = 0.508 4/10/2011 Lecture 3 4 2 3/14/2012 Independent trials consisting of the flipping of a coin having probability p of coming up heads are continually performed until either a head occurs or a total of n flips is made. If we let X denote the number of times the coin is flipped, then X is a random variable taking on one of the values 1, 2, 3, . . . , n with respective probabilities: b bili i P { X = 1} = p P { X = 2} = (1 − p ) p P { X = 3} = (1 − p ) p # n−2 P { X = n − 1} = (1 − p ) p 2 P { X = n} = (1 − p ) 4/10/2011 n −1 Lecture 3 5 Three balls are randomly chosen from an urn containing 3 white, 3 red, and 5 black balls. Suppose that we win $1 for each white ball selected and lose $1 for each red ball selected. If we let X denote our total winnings from the experiment, then X is a random variable taking on the possible values ‐3,‐2, ‐1, 0, 1, 2, 3 with respective probabilities i b bili i Suppose every ball has a number. Then your balls are: W1, W2, W3, R1, R2, R3, B1, B2, B3, B4, B5 Or for convenience I will number them from 1 to 11. So there are ways to choose three balls from this set. ⎛11⎞ ⎜3⎟ ⎝ ⎠ 4/10/2011 Lecture 3 6 3 3/14/2012 The list of possible values for X is {‐3,‐2,‐1,0,1,2,3} To get ‐3, we must choose RRR. To get ‐2, we must choose 2 R and one B To get ‐1, we must choose 2 R and one W or one R and two B. T To get 0, we must choose one R, one W and one B or BBB 0 h R W d B BBB To get +1, we must choose 2 W and one R or one W and two B To get +2, we must choose 2W and one B To get +3, we must choose WWW. So: So: 4/10/2011 ⎛ 3⎞ ⎜ 3⎟ 1 P { X = −3} = P { X = 3} = ⎝ ⎠ = ⎛11⎞ 165 ⎜3⎟ ⎝ ⎠ Lecture 3 7 ⎛ 3 ⎞⎛ 5 ⎞ ⎜ 2 ⎟⎜ 1 ⎟ 15 P { X = −2} = P { X = 2} = ⎝ ⎠⎝ ⎠ = 165 ⎛11⎞ ⎜3⎟ ⎝ ⎠ ⎛ 3 ⎞⎛ 3 ⎞ ⎛ 3 ⎞⎛ 5 ⎞ ⎜ 2 ⎟⎜ 1 ⎟ + ⎜ 1 ⎟⎜ 2 ⎟ 39 P { X = −1} = P { X = 1} = ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ = 165 ⎛11⎞ ⎜3⎟ ⎝ ⎠ ⎛ 5 ⎞ ⎛ 3 ⎞⎛ 3 ⎞⎛ 5 ⎞ ⎜ 3 ⎟ + ⎜ 1 ⎟⎜ 1 ⎟⎜ 1 ⎟ 55 P { X = 0} = ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ = 165 ⎛11⎞ ⎜3⎟ ⎝ ⎠ 4/10/2011 Lecture 3 8 4 3/14/2012 The cumulative distribution function • For a random variable X, the function F defined by F ( x ) = P { X ≤ x} − ∞ < x < ∞ • is called the cumulative distribution function, or, the , , distribution function, of X. • Thus, the distribution function specifies, for all real values x, the probability that the random variable is less than or equal to x. • F(x) is a nondecreasing function of x, that is, • If If a < b then then F(a) < F(b). 4/10/2011 Lecture 3 9 For the previous example: 1 165 15 P { X = −2} = P { X = 2} = 165 39 P { X = −1} = P { X = 1} = 165 55 P { X = 0} = 165 P { X = −3} = P { X = 3} = 4/10/2011 Lecture 3 10 5 3/14/2012 For the previous example: 1 165 1 15 16 F ( −2 ) = + = 165 165 165 1 15 39 55 F ( −1) = + + = 165 165 165 165 1 15 39 55 110 F ( 0) = + + + = 165 165 165 165 165 1 15 39 55 39 149 F ( +1) = + + + + = 165 165 165 165 165 165 1 15 39 55 39 15 164 F ( +2 ) = + + + + + = 165 165 165 165 165 165 165 1 15 39 55 39 15 1 165 F ( +3) = + + + + + + = =1 165 165 165 165 165 165 165 165 F ( −3) = 4/10/2011 Lecture 3 11 Probability Mass Function • Is defined for a discrete variable X. p ( a ) = P { X = a} • Suppose that ⎧ p ( xi ) ≥ 0 for i = 1, 2,... p (a) = ⎨ ⎩ p( x) = 0 for all other values of x • Then since x must be one of the values xi, ∞ ∑ p(x ) =1 i =1 4/10/2011 i Lecture 3 12 6 3/14/2012 Example of probability mass function p ( 0 ) = P { X = 0} = 1 4 p (1) = P { X = 1} = 1 2 p ( 2 ) = P { X = 2} = 1 4 4/10/2011 Lecture 3 13 Example • The probability mass function of a random variable X is given by p (i ) = c λ i i ! i=0,1,2,… where λ is some positive value. • Find (a) P{X = 0} and (b) P{X > 2}. ∞ ∑ p(i) = 1 i =0 ∞ ∞ λi i =0 i! ∑ p(i) = ∑ c i =0 c = e− λ 4/10/2011 ∞ xi e =∑ i =0 i ! x = ceλ = 1 Lecture 3 14 7 3/14/2012 The cumulative distribution function • The cumulative distribution function F can be expressed in terms of p(a) by F (a) = ∑ p( x) all x ≤ a • If X is a discrete random variable whose possible values are x1, x2, x3, … where x1< x2 < x3 < … then the distribution function F of X is a step function. 4/10/2011 Lecture 3 15 Example • For example, suppose the probability mass function (pmf) of X is 1 1 1 1 p (1) = p ( 2) = p ( 3) = p ( 4) = 4 2 8 8 • then the distribution function F of X is a <1 ⎧ 0 ⎪1 4 1 ≤ a < 2 ⎪ F ( a ) = ⎨3 4 2 ≤ a < 3 ⎪7 8 3 ≤ a < 4 ⎪ 1 4≤a ⎩ 4/10/2011 Lecture 3 16 8 3/14/2012 Expectation of a random variable • If X is a discrete random variable having a probability mass function p(x) then the expectation or the expected value of X denoted by E[X] is defined by E[X ] = ∑ xp( x) x: p ( x ) > 0 • In other words, • Take every possible value for X • Multiply it by the probability of getting that value • Add the result. 4/10/2011 Lecture 3 17 Examples of expectation • For example, suppose you have a fair coin. You flip the coin, and define a random variable X such that – If the coin lands heads, X = 1 – If the coin lands tails, X = 2 • Then the probability mass function of X is given by p (1) = p ( 2 ) = 1 2 Or we can write p ( x) = { 1 2 if x = 1 or x = 2, 0 otherwise. 1 1 E [ X ] = 1× + 2 × = 1.5 2 2 4/10/2011 Lecture 3 18 9 3/14/2012 Examples of expectation • Next, suppose you throw a fair die. You flip the die, and define a random variable Y such that – If the die lands a number less than or equal to 5, then Y = 0 – If the die lands a number greater than 5, then Y = 1 • Then the probability mass function of Y is given by ⎧5 6 if y = 0, ⎪ p ( y ) = Pr {Y = y} = ⎨1 6 if y = 1, ⎪⎩0 otherwise. 5 1 1 E [ X ] = 0 × + 1× = 6 6 6 4/10/2011 Lecture 3 19 Frequency interpretation of probabilities • The law of large numbers – we will see in chapter 8 – assumes that if we have an experiment (e.g. tossing a coin) and we perform it an infinite number of times, then the proportion of time that any event E occurs will be P(E) ( ). • [Recall here than event means a subset of the sample space, or a set of outcomes for the experiment] • So for instance suppose X is a random variable which will be equal to x1 with probability p(x1), x2 with probability p(x2), …, xn with probability p(xn). • By the frequency interpretation, if we keep playing this game, then the proportion of time that we win xi will be p(xi). 4/10/2011 Lecture 3 20 10 3/14/2012 Frequency interpretation of probabilities • Or we can say that when we play the game N times, where N is a very big number, we will win xi about Np(xi) times. • Then the average winnings per game will be: x1 × ( No. of times I won x1 ) + x2 × ( No. of times I won x2 ) + ... + xn × ( No. of times I won xn ) No. of times I played Np ( x1 ) + x2 Np ( x2 ) + ... + xn Np ( xn ) n = = ∑ xn p ( xn ) = E [ X ] N i =1 4/10/2011 Lecture 3 21 Example 3a • Question: – Find E[X] where X is the outcome when we roll a fair die. • Solution: – Since p (1) = p ( 2 ) = p ( 3) = p ( 4 ) = p ( 5 ) = p ( 6 ) = 1 6 E [ X ] = 1⋅ p (1) + 2 ⋅ p ( 2 ) + 3 ⋅ p ( 3) + 4 ⋅ p ( 4 ) + 5 ⋅ p ( 5 ) + 6 ⋅ p ( 6 ) 1 1 ⎛ 6⋅7 ⎞ = (1 + 2 + 3 + 4 + 5 + 6 ) = ⎜ ⎟ = 3.5 6 6⎝ 2 ⎠ 4/10/2011 Lecture 3 22 11 3/14/2012 Example 3b • Question: – We say that I is an indicator variable for an event A if ⎧1 if A occurs I =⎨ c ⎩0 if A occurs – What is E[I] ? ( ) E [ I ] = 1⋅ p ( A ) + 0 ⋅ p Ac = p ( A ) 4/10/2011 Lecture 3 23 Example 3d • A school class of 120 students is driven in 3 buses to a symphonic performance. There are 36 students in one of the busses, 40 in another, and 44 in the third bus. When the busses arrive, one of the 120 students is randomly chosen. Let f X denote the number of students on the bus of that randomly chosen student, and find E[X]. • Solution: E [ X ] = 36 ⋅ Pr {Student is on 1st bus} + 40 ⋅ Pr {Student is on 2nd bus} + 44 ⋅ Pr {Student is on 3rd bus} Pr {Student is on 1st bus} = 36 120 Pr {Student is on 2nd bus} = 44 120 Pr {Student is on 2nd bus} = 40 120 E [ X ] = 36 ⋅ 36 120 + 40 ⋅ 40 120 + 44 ⋅ 44 120 = 40.27 4/10/2011 Lecture 3 24 12 3/14/2012 Example 3d • Same problem as before, but assume that the bus is chosen randomly instead of the student, and find E[X]. • Solution: E [ X ] = 36 ⋅ Pr {1st bus is chosen} + 40 ⋅ Pr {2nd bus is chosen} + 44 ⋅ Pr {3rd bus is chosen} Pr {1st bus is chosen} = Pr {2nd bus is chosen} = Pr {3rd bus is chosen} = 1 3 E [ X ] = 36 ⋅1 3 + 40 ⋅1 3 + 44 ⋅1 3 = 40.00 4/10/2011 Lecture 3 25 Expectation of a function of a random variable • To find E[g(x)], that is, the expectation of g(X) • Two step process: – find the pmf of g(x) p g( ) – find E[g(x)] 4/10/2011 Lecture 3 26 13 3/14/2012 Let X denote a random variable that takes on any of the values –1, 0, and 1 with respective probabilities P{ X = −1} = 0.2 P{ X = 0} = 0.5 P{ X = 1} = 0.3 Compute E ⎡⎣ X 2 ⎤⎦ Solution Let Y = X 2. P{Y = 1} = P{ X = −1} + P{ X = 1} = 0.5 P{Y = 0} = P{ X = 0} = 0.5 Then the probability mass function of Y is given by p ( y) = { 0.5 if y = 0 or y = 1 0 otherwise. E ⎡⎣ X 2 ⎤⎦ = E [Y ] = 1(0.5) + 0(0.5) = 0.5 4/10/2011 Lecture 3 27 Statistics vs. Probability • You may have noticed that the concept of “expectation” seems a lot like the concept of “average”. • So why do we use this fancy new word “expectation”? Why not just call it “average”? • We find the average of a list of numbers. The numbers are already known. • We find the expectation of a random variable. We may have only one such random variable. We may only toss the coin or die once. 4/10/2011 Lecture 3 28 14 3/14/2012 Statistics vs. Probability • For instance, let us define a random variable X using the result of a coin toss: let X = 1 if the coin lands heads, X = 0 if the coin lands tails. • If we perform this experiment K times, we will get a list of values for X. We can find the average value for K by adding all the values for X, and dividing by K. 1 K ∑ Xi K i =1 • Is this coin fair? We don Is this coin fair? We don’tt know, but we can find out. know, but we can find out. Number of times the coin lands heads K Number of times the coin lands tails p (1) = Pr { X = 1} = K p ( 0 ) = Pr { X = 0} = 4/10/2011 Lecture 3 29 Statistics vs. Probability • What we did on the previous slide was statistics: we analyzed the data to draw some conclusions about the process or mechanism (i.e. the coin) that generated that data. • Probability is how we draw conclusions about the future. • So suppose I did the experiments on the previous slide yesterday. Today I will come into the class and toss the coin exactly once. • Then I can use the statistics from yesterday to help find out what I can expect the result of the coin toss to be today: p y 1 E { X } = ∑ ip ( i ) = 0 ⋅ p ( 0 ) + 1⋅ p (1) i =0 4/10/2011 Lecture 3 30 15 3/14/2012 Statistics vs. Probability • Okay, so I got 0.5. • What does this mean? X can never equal 0.5. • Expectation makes more sense with continuous random p variables, e.g. when you measure a voltage on a voltmeter. • With the coin toss you can think of it this way: • Suppose someone wants you to guess X. But you will pay a lot of money if you’re wrong, and the money you pay is proportional to how wrong you are. • If you guess g, and the result was actually a, then you have to If you guess g and the result was actually a then you have to pay 100 ( g − a )2 • What should you guess? 2 2 • You must minimize ( g − 1) p (1) + ( g − 0 ) p ( 0 ) • If you guess g=E[X], then this penalty is minimized. 4/10/2011 Lecture 3 31 Statistics: how to find the pmf of a random voltage from measurements Suppose you are going to measure a voltage. You know that the voltage is really about 5V. But you have an old voltmeter that doesn’t measure very well. The voltmeter is digital and has 1 decimal place. So you can only read voltages 0.00, 0.1, …, 4.7, 4.8, 4.9, 5.0, 5.1, …, 9.9. • You start measuring the voltage. You get the following measurements: 4.7, 5.0, 4.9, 5.0, 5.3, 4.9, 4.8, 5.2, … • From these measurements you can construct a probability From these measurements you can construct a probability mass function graph as follows. • • • • 4/10/2011 Lecture 3 32 16 3/14/2012 Pmf drawn from results of experiment Measurements: 4.7, 5.0, 4.9, 5.0, 5.3, 4.9, 4.8, 5.2,5.0, 4.5, 4.8, 5.1, 5.0, 5.1, 4.9, 5.3, 5.1, 5.2, 5.1, 5.4 18 13 19 10 1 15 9 17 11 6 4 14 18 16 7 3 2 12 8 5 20 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 4/10/2011 Lecture 3 33 And to show this with animation Measurements: 4.7, 5.0, 4.9, 5.0, 5.3, 4.9, 4.8, 5.2,5.0, 4.5, 4.8, 5.1, 5.0, 5.1, 4.9, 5.3, 5.1, 5.2, 5.1, 5.4 18 13 19 10 1 15 9 17 11 6 4 14 18 16 7 3 2 12 8 5 20 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 4/10/2011 Lecture 3 34 17 3/14/2012 pmf derived mathematically • Based on the frequency interpretation, we can define the pmf as follows: p ( 5.1) = 4 p ( 4.5 4 5) = 1 20 20 p ( 5.2 ) = 2 p ( 4.6 ) = 0 20 2 p ( 4.7 ) = 1 p ( 5.3) = 20 20 1 p ( 4.8 ) = 2 p ( 5.4 ) = 20 20 3 p ( 5.5 ) = 0 p ( 4.9 ) = 20 p ( 5.0 ) = 5 20 • Now I can predict the future based on this pmf. • Probability does not bother with data. Statistics is all about data. 4/10/2011 Lecture 3 35 Statistics vs. Probability • Are these the correct probabilities? I don’t know. Even if we ran the experiment millions of times, we would be wrong, probably a little wrong, maybe even very wrong. It is always possible to throw 1000 heads in a row even with a fair die, f although it is very unlikely that this will happen. • In any case, when studying probability we are not concerned with whether the pmf is correct for this experiment, because we do not care about experiments or data. • Statisticians, or the people who designed this experiment must take care to design it well, so they can give us a good statistical model. • All we know is the statistical model (that is the pmf) and we derive, mathematically, predictions about the future based on this pmf. 4/10/2011 Lecture 3 36 18 3/14/2012 2011 Lecture 5 37 2011 Lecture 5 38 19 3/14/2012 2011 Lecture 5 39 2011 Lecture 5 40 20 3/14/2012 Variance 2011 Lecture 5 41 Variance • Consider the p.m.f. for the following three variables: W = 0 with probability 1 −1 with pprobabilityy 1/ 2 Y= +1 with probability 1/ 2 −10 with probability 1/ 2 Z= +10 with probability 1/ 2 { { • All three variables have the same expectation, but their probability mass functions are very different. W is always the same, Y changes a bit, Z changes a lot. The variance Y h bi Z h l Th i quantifies these changes. 2011 Lecture 5 42 21