Homework I, Advanced algorithms 2014

Homework I, Advanced algorithms 2014
Before you start:
1. The deadlines in this course are strict. This homework set is due one October 17 at 13.00 and
should be delivered on paper in the mail slot of Johan Håstad on level 4 on Lindstedtsvägen 3.
2. This homework is supposed to be done individually.
3. Note that in problems with subproblem, the first number given is the total number of points for
the problem and later there is information how this total is distributed over the subproblems.
4. When asked to solve a computational problem by hand please submit, in some form or other,
the useful calculations that lead to the answer.
5. Unless explicitly instructed to do so, you are not supposed to search for the answer to a problem
on the Internet. You are of course allowed to look for general information on the Internet. Let
us give two clarifying examples. For the problem on Chinese Remainder Theorem you can of
course study information on the Chinese Remainder Theorem in general. We hope it is equally
clear to you that on the problem of finding a lower bounds on the number comparisons to
compute the median, you are not supposed to look for sources on the Internet proving exactly
this fact. If you are in doubt what you can do, contact Johan Håstad or one of the teaching
assistants.
The problems are given in no particular order. If something seems wrong, then visit
http://www.csc.kth.se/utbildning/kth/kurser/DD2440/avalg14/homework to see if any
errata was posted. If this does not help, then email johanh@kth.se. Don’t forget to prefix your email
subject with Avalg14.
1
(8p) As we have discussed in class, a generator in Zp is a number g such that the powers of g give all
non-zero elements of Zp . For instance 3 is a generator in Z7 as its powers are (in order) 3,2,6,4,5,1.
Find all generators in Z31 and for each found generator g find the discrete logarithm of 2. In other words
solve g y ≡ 2 for each g. You are supposed to do this by hand.
Hint: If you want to save yourself computational work, finding one generator and all its powers in
order should be a long way towards both finding all the generators and the discrete logs of 2.
2
n
(8p) We proved in class that given a set, S, of N pairwise distinct strings (xi )N
i=1 in {0, 1} , and a random
matrix H of size m × n then the number of pairs (xi , xj ) with i 6= j such that Hxi = Hxj (note that these
are m-bit strings) is expected to be N (N − 1)2−1−m . In particular if m ≥ log N we are likely to have a
reasonably good hash function. This is not always the case
2a
(2p) Construct one bad example in the form of a set S and a non-zero matrix H such that all strings
in S hash onto the same value.
2b
(3p) Construct a bad set S such that for many matrices H, all strings in S map to the same string.
Finding the maximally bad S is needed for a full score but you need not make a formal proof that
it is optimal.
Page 1 (of 2)
Advanced algorithms • Fall 2014
Johan Håstad
2c
(3p) Do give a formal proof that the S you constructed in the previous sub-problem is the worst
possible in this respect.
3
(8p) Let P4 be the first four digits of your personal number (i.e. YYMM, the year and month you were
born) and let p be the smallest prime larger than P4 and q the smallest prime larger than p. You may find
p and q by computer but the following you should do by hand. Find a number x such that
4711 mod p
x≡
1 mod q
4
(8p) Your task is to study a double hash1 table. In the file http://www.csc.kth.se/utbildning/
kth/kurser/DD2440/avalg14/homework/input you find 222 entries each a bit-string of length 64
written in hex. In fact they are sorted.
4a
(5p) Construct a double hash table for this set of data. In particular construct a hash function H
given by a 22 × 64 matrix such that the total number of collisions under H is bounded by 222 .
Then for each i such that at least three elements map to i under H construct a hash function Hi
mapping 64 bits to the fewest number of bits possible to make all x such Hx = i map to distinct
values under Hi . Do not only give the resulting functions but give a short account of your efforts.
In particular report statistics on the number of Hi needed and how many of each number of output
bits. Please make the description of the functions used available in a directory with public access
(and specify its location in your solution). The first 22 lines should contain the rows of the matrix
H (each as a 16 hex characters). Then for each i with at least three preimages give first a line with i
and the number si of the number of output bits and then si lines with the rows of the corresponding
matrix.
Hint: To compute this set of hash functions it might be useful to use the machine-operation that
takes the bit-wise and of two machine words.
4b
5
(3p) Discuss if this double hash table would be your favorite way to do search queries in this data
set. What method would you use and why?
(8p) In class we proved that if we are in a comparison model then n − 1 comparisons are needed to find
the median in a set of n inputs. Your task is to, for odd n, improve this bound to (3n − 3)/2 assuming that
the algorithm takes (n − 1)/2 disjoint pairs and starts by making the comparisons given by these pairs.
In other words you need to prove that the algorithms needs to make an additional n − 1 comparisons after
this start.
Hint: Prove that if the median is the element not taking part in any of the initial comparisons then
you might need n − 1 more comparison to verify this fact.
1
This refers to double hasing as described in Section 20.3 in the lecture notes by Håstad. The most common notion of
double hashing is something else and the lectures notes should be updated.
Page 2 (of 2)
Advanced algorithms • Fall 2014
Johan Håstad