CS246 Midterm, Spring 2014 — Page: 1 UCLA Computer Science Department Spring 2014 Instructor: J. Cho Student Name and ID: CS246 Midterm: Closed Book, 1 Hour 30 Minutes Attach extra pages as needed. Write your name and ID on the extra pages. Please, write neatly. Problem 1 2 3 4 5 Total Exam Score: Score 20 20 20 20 20 100 CS246 Midterm, Spring 2014 — Page: 2 Problem 1: 20 points We want to identify the number of unique books available on Amazon. We consider a book is different from another if their ISBN numbers are different. (An ISBN number is a sequence of 10 numeric digits that are assigned to a book.) We assume every book has an ISBN number. Describe an efficient way to estimate the number of unique books available on Amazon. Note that Amazon provides search interfaces on various fields on books, including title, author and ISBN. However, you are not allowed to download the entire Web site from Amazon to measure the number of books and Amazon does not support “NOT ...” queries. CS246 Midterm, Spring 2014 — Page: 3 Problem 2: 20 points You are a Web administrator who is in charge of all pages in the set S given below. As far as you know, Google uses the following TrustRank variation of PageRank: X T R(pj )/cj + bi , T R(pi ) = 0.8 pj ∈P AREN T (pi ) where P AREN T (pi ) is the set of all pages that have a link to pi and cj is the number of outgoing links in pj . S p2 p1 p4 p3 None of your pages in S is trusted by Google (i.e., bi = 0 for all pi ∈ S). Fortunately, some pages outside of S have a link to your pages as shown in the graph. You know that the TrustRank scores of the pages outside of S are as follows: T R(p1 ) = 0, T R(p2 ) = T R(p3 ) = 0.06, T R(p4 ) = 0.01 Your task is to make the minimum number of modifications to the link structure among and/or from the pages in S in order to (1) maximize the TrustRank sum of all pages in S and (2) all pages in S have non-zero T R(pi ) values. CS246 Midterm, Spring 2014 — Page: 4 1. Indicate such modifications directly to the graph below. In your modifications, you are only allowed to add and/or delete link(s) originating from page(s) in S. To get partial credit, you may want to briefly explain why your modifications are suggested. (15 points) S p2 p1 p4 p3 2. After your modification, what is the sum of all TrustRank scores of all pages in S? (5 points) X pi ∈S T R(pi ) = CS246 Midterm, Spring 2014 — Page: 5 Problem 3: 20 points Given the following 3×3 matrix A, we perform its singular value decomposition into Q1 DQT2 : √ 6 2 9 √ −1 5 2 5 √ 6 2 9 A= = Q1 D QT2 , 1 5√2 5 4 3 0 −5 5 where Q1 and Q2 are orthonormal matrices and D is a diagonal matrix. 1. Write down the three decomposed matrices in the following space (7 points): Q1 = D= T Q2 = Please note that the third matrix is QT2 , not Q2 . You may find the following information helpful in performing the decomposition: AT A is a symmetric matrix with three eigenvalues, 2, 1, and 9, whose corresponding eigenvectors are (0, 1, 0), (−3/5, 0, 4/5), and (4/5, 0, 3/5), respectively. AAT is also a symmetric matrix with√three√eigenvalues, 1,√2, and √ 9, whose corresponding eigenvectors are (0, 0, 1), (−1/ 2, 1/ 2, 0), and (1/ 2, 1/ 2, 0), respectively. CS246 Midterm, Spring 2014 — Page: 6 2. Please describe the linear transformation that the matrix A represents as a (combination of) stretching and/or rotation. For stretching, make sure you specify the stretching direction(s) and stretching factor. For rotation, describe how three orthonormal (basis) vectors transform. (For example, you may say that √ the √ matrix correspond to rotation, where the three basis vectors (1, 0, 0), (0, 1/ 2, 1/ 2), and √ √ (0, −1/ 2, 1/ 2) transform to (0, 1, 0), (3/5, 0, 4/5), and (4/5, 0, −3/5), respectively) (7 points) 3. Consider the multiplication of the matrix A with a vector X whose length is 1 (i.e., |X| = 1). What is the largest possible value of |AX|? What is the X that gives such a |AX|? (6 points) CS246 Midterm, Spring 2014 — Page: 7 Problem 4: 20 points. Consider a collection of 5 documents, d1 , d2 , ..., d5 . Each document contains 10 words. The entire document collection contains three unique words, Microsoft, office, and chair. The following diagram shows how many times each word appears in each document. For now ignore the color of each circle in the diagram. Microsoft d1 d2 d3 d4 d5 ••◦ ••••• ••••••• office ◦•◦ ◦◦◦•◦ • ◦ ◦• ◦•◦•• •◦• chair ◦◦◦◦◦•◦ ◦◦◦◦◦ •◦◦ For example, the document d3 contains the word Microsoft three times, office four times, and chair three times. We have run the LDA algorithm and have assigned each word in the documents to one of two topics, z1 and z2 through multiple iterations. The result so far is labeled as a black (z1 ) or white (z2 ) color in the above diagram. For example, the first and the third office’s of d1 have been assigned to z2 (white dots), and the second office of d1 has been assigned to z1 (a black dot). We are using α = 0 and β = 0 as the parameter values of the LDA algorithm. 1. As the final step in our LDA iterations, we want to reassign the topic of the third office of d5 to either z1 or z2 . In the previous iteration, the word was assigned to z1 (a black dot), as shown in the diagram. What should be the probability that the word is assigned to z1 in the current iteration? What about to z2 ? Write down the two probabilities. (5 points) CS246 Midterm, Spring 2014 — Page: 8 2. Assume that the third office of d5 was assigned again to z1 in our last iteration. Given the final assignment of topics, write down the estimated values of the following probability vectors. (10 points) Probability vector hP (z1 |d1 ), P (z2 |d1 )i Values hP (z1 |d3 ), P (z2 |d3 )i hP (z1 |d4 ), P (z2 |d4 )i hP (M icrosof t|z1 ), P (of f ice|z1 ), P (chair|z1 )i hP (M icrosof t|z2 ), P (of f ice|z2 ), P (chair|z2 )i 3. Given the query Microsoft, we decided to rank documents based on TF-IDF weighting and the cosine similarity measure between the query and each document. Which documents will have non-zero scores under this scheme? What will be their relative ranking? For example, if d1 , d3 , and d4 have non-zero scores and d1 has a higher similarity score than d4 , and d4 has a higher score than d3 , write your answer as “d1 > d4 > d3 ”. (5 points) CS246 Midterm, Spring 2014 — Page: 9 Problem 5: 20 points You want to simulate the arrival of incoming phone calls at your company’s call center. You know that your call arrivals follow a Poisson process very closely. On average, your call center gets 20 phone calls per hour. To perform simulation, you decided to discretize time in the unit of one second and generate a call arrival event at the beginning of every second with probability p. 1. Compute the value of the probability p to be used for this simulation. (10 points) 2. What is the probability that your call center does not get any phone call for x seconds? (10 points)
© Copyright 2025