Algorithms ・ 3.1 S

Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
Data structures
“ Smart data structures and dumb code works a lot better
than the other way around. ” – Eric S. Raymond
3.1 S YMBOL T ABLES
‣ API
‣ elementary implementations
Algorithms
F O U R T H
‣ ordered operations
E D I T I O N
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
2
Symbol tables
Key-value pair abstraction.
・Insert a value with specified key.
・Given a key, search for the corresponding value.
3.1 S YMBOL T ABLES
‣ API
Ex. DNS lookup.
・Insert domain name with specified IP address.
・Given domain name, find corresponding IP address.
‣ elementary implementations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
‣ ordered operations
domain name
IP address
www.cs.princeton.edu
128.112.136.11
www.princeton.edu
128.112.128.15
www.yale.edu
130.132.143.21
www.harvard.edu
128.103.060.55
www.simpsons.com
209.052.165.60
key
value
4
Symbol table applications
Symbol tables: context
Also known as: maps, dictionaries, associative arrays.
application
purpose of search
key
value
dictionary
find definition
word
definition
book index
find relevant pages
term
list of page numbers
file share
find song to download
name of song
computer ID
financial account
process transactions
account number
transaction details
web search
find relevant web pages
keyword
list of page names
compiler
find properties of variables
variable name
type and value
routing table
route Internet packets
destination
best route
DNS
find IP address
domain name
IP address
reverse DNS
find domain name
IP address
domain name
genomics
find markers
DNA string
known positions
file system
find file on disk
filename
location on disk
Generalizes arrays. Keys need not be between 0 and N – 1.
Language support.
・External libraries: C, VisualBasic, Standard ML, bash, ...
・Built-in libraries: Java, C#, C++, Scala, ...
・Built-in to language: Awk, Perl, PHP, Tcl, JavaScript, Python, Ruby, Lua.
every array is an
associative array
every object is an
associative array
hasNiceSyntaxForAssociativeArrays["Python"] = True
hasNiceSyntaxForAssociativeArrays["Java"]
= False
legal Python code
5
Basic symbol table API
Java allows null value
・Values are not null.
・Method get() returns null if key not present.
・Method put() overwrites old value with new value.
public class ST<Key, Value>
void put(Key key, Value val)
Value get(Key key)
boolean contains(Key key)
void delete(Key key)
boolean isEmpty()
int size()
Iterable<Key> keys()
6
Conventions
Associative array abstraction. Associate one value with each key.
ST()
table is the only
primitive data structure
create an empty symbol table
put key-value pair into the table
value paired with key
Intended consequences.
a[key] = val;
・Easy to implement contains().
a[key]
is there a value paired with key?
public boolean contains(Key key)
{ return get(key) != null; }
remove key (and its value) from table
is the table empty?
・Can implement lazy version of delete().
number of key-value pairs in the table
all the keys in the table
public void delete(Key key)
{ put(key, null); }
7
8
Keys and values
Equality test
Value type. Any generic type.
All Java classes inherit a method equals().
specify Comparable in API.
Key type: several natural assumptions.
Java requirements. For any references x, y and z:
・Assume keys are Comparable, use compareTo().
・Assume keys are any generic type, use equals() to test equality.
・Assume keys are any generic type, use equals() to test equality;
・Reflexive:
・Symmetric:
・Transitive:
・Non-null:
use hashCode() to scramble key.
x.equals(x)
is true.
x.equals(y)
iff y.equals(x).
equivalence
relation
if x.equals(y) and y.equals(z), then x.equals(z).
x.equals(null)
is false.
built-in to Java
(stay tuned)
do x and y refer to
the same object?
Default implementation. (x == y)
Best practices. Use immutable types for symbol table keys.
Customized implementations. Integer, Double, String, java.io.File, …
・Immutable in Java: Integer, Double, String, java.io.File, …
・Mutable in Java: StringBuilder, java.net.URL, arrays, ...
User-defined implementations. Some care needed.
9
10
Implementing equals for user-defined types
Implementing equals for user-defined types
Seems easy.
Seems easy, but requires some care.
public
class Date implements Comparable<Date>
{
private final int month;
private final int day;
private final int year;
...
public final class Date implements Comparable<Date>
{
private final int month;
private final int day;
private final int year;
...
public boolean equals(Date that)
{
if (this.day
!= that.day ) return false;
if (this.month != that.month) return false;
if (this.year != that.year ) return false;
return true;
typically unsafe to use equals() with inheritance
(would violate symmetry)
public boolean equals(Object y)
{
if (y == this) return true;
check that all significant
fields are the same
}
must be Object.
Why? Experts still debate.
optimize for true object equality
if (y == null) return false;
check for null
if (y.getClass() != this.getClass())
return false;
objects must be in the same class
(religion: getClass() vs. instanceof)
Date that = (Date) y;
if (this.day
!= that.day ) return false;
if (this.month != that.month) return false;
if (this.year != that.year ) return false;
return true;
cast is guaranteed to succeed
check that all significant
fields are the same
}
}
}
11
12
Equals design
ST test client for traces
"Standard" recipe for user-defined types.
Build ST by associating value i with ith string from standard input.
・Optimization for reference equality.
・Check against null.
・Check that two objects are of the same type; cast.
・Compare each significant field:
but use Double.compare() with double
– if field is a primitive type, use ==
(to deal with -0.0 and NaN)
– if field is an object, use equals()
apply rule recursively
– if field is an array, apply to each entry
can use Arrays.deepEquals(a, b)
but not a.equals(b)
public static void main(String[] args)
{
ST<String, Integer> st = new ST<String, Integer>();
for (int i = 0; !StdIn.isEmpty(); i++)
keys S E A R C H
{
values 0 1 2 3 4 5
String key = StdIn.readString();
st.put(key, i);
output for
basic symbol table
}
(one possibility)
for (String s : st.keys())
L 11
StdOut.println(s + " " + st.get(s));
}
P 10
e.g., cached Manhattan distance
Best practices.
・No need to use calculated fields that depend on other fields.
・Compare fields mostly likely to differ first.
・Make compareTo() consistent with equals().
keys
13
ST test client for analysis
and print out one that occurs with highest frequency.
% more
it was
it was
it was
it was
it was
it was
it was
it was
it was
it was
tiny example
(60 words, 20 distinct)
% java FrequencyCounter 8 < tale.txt
business 122
real example
(135,635 words, 10,769 distinct)
% java FrequencyCounter 10 < leipzig1M.txt
government 24763
R
C
H
E
X
A
M
values 0
1
2
3
4
5
6
7
8
9 10 11 12A
L
P
M
11
10
9
X
H
7
5
C
4
A
8
C
4
E
H
L
M
P
12
5
11
9
10
ER
E
S
real example
(21,191,455 words, 534,580 distinct)
15
X
A
M
6
7
8
9 10 11 12
P
L
output
outputfor
ordered
symbol table
A
C
8
4
E
H
L
M
P
R
S
X
12
5
11
9
10
3
0
7
Keys, values, and output for test client
public class FrequencyCounter
R 3
{
3
A 8
public static void main(String[]
args) R
{
S 0
E 12
int minlen = Integer.parseInt(args[0]);
X Integer>();
7
S st0= new ST<String,
ST<String, Integer>
while (!StdIn.isEmpty())
Keys, values, and output for test client
{
String word = StdIn.readString();
ignore short strings
if (word.length() < minlen) continue;
if (!st.contains(word)) st.put(word, 1);
else
st.put(word, st.get(word) + 1);
}
String max = "";
st.put(max, 0);
for (String word : st.keys())
if (st.get(word) > st.get(max))
max = word;
StdOut.println(max + " " + st.get(max));
}
}
tinyTale.txt
the best of times
the worst of times
the age of wisdom
the age of foolishness
the epoch of belief
the epoch of incredulity
the season of light
the season of darkness
the spring of hope
the winter of despair
% java FrequencyCounter 1 < tinyTale.txt
it 10
4
3
8
12
0
A
Frequency counter implementation
Frequency counter. Read a sequence of strings from standard input
C
E
output for
ordered
symbol table
L
9
7
5
S
output for
basic symbol table
(one possibility)
x.equals(y) if and only if (x.compareTo(y) == 0)
P
M
X
H
E
14
create ST
read string and
update frequency
print a string
with max freq
16
E
Sequential search in a linked list
Data structure. Maintain an (unordered) linked list of key-value pairs.
Search. Scan through all keys until find a match.
Insert. Scan through all keys until find a match; if no match add to front.
3.1 S YMBOL T ABLES
key value first
‣ API
‣ elementary implementations
‣ ordered operations
Algorithms
S
0
S 0
red nodes
are new
E
1
E 1
S 0
A
2
A 2
E 1
S 0
R
3
R 3
A 2
E 1
S 0
C
4
C 4
R 3
A 2
E 1
S 0
H
5
H 5
C 4
R 3
A 2
E 1
S 0
E
6
H 5
C 4
R 3
A 2
E 6
S 0
black nodes
are accessed
in search
circled entries are
changed values
X
7
X 7
H 5
C 4
R 3
A 2
E 6
S 0
R OBERT S EDGEWICK | K EVIN W AYNE
A
8
X 7
H 5
C 4
R 3
A 8
E 6
S 0
http://algs4.cs.princeton.edu
M
9
M 9
X 7
H 5
C 4
R 3
A 8
E 6
gray nodes
are untouched
S 0
P 10
P 10
M 9
X 7
H 5
C 4
R 3
A 8
E 6
S 0
L 11
L 11
P 10
M 9
X 7
H 5
C 4
R 3
A 8
E 6
S 0
E 12
L 11
P 10
M 9
X 7
H 5
C 4
R 3
A 8
E 12
S 0
Trace of linked-list ST implementation for standard indexing client
18
Elementary ST implementations: summary
Binary search in an ordered array
Data structure. Maintain an ordered array of key-value pairs.
Rank helper function. How many keys < k ?
guarantee
average case
operations
on keys
implementation
sequential search
(unordered list)
search
insert
search hit
insert
N
N
N
N
keys[]
4 5 6 7 8 9
successful search for P
keys[]
lo hi m
A C E H L M P R S X
successful
search
for for
P
entries in black
0
4 M
5 6
successful
0search
9 4P
A 1
C 2
E 3
H keys[]
L
P 7
R 8
S 9
X
are a[lo..hi]
lo
hi
m
5 9 7
A
0 C
1 E
2 H
3 L
4 M
5 P
6 R
7 S
8 X
9
successful search for P
entries in black
0
A
5 9
6 4
5
A C
C E
E H
H L
L M
M P
P R
R S
S X
X
are a[lo..hi]
lo hi m
entry in red is a[m]
5
9 7
A
C E
H L
M P
R S
X
entries in black
6
A
0 6
9 6
4
A C
C E
E H
H L
L M
M P
P R
R S
S X
X
are a[lo..hi]
5
6
5
A
C
E
H
L
M
P
R
S
X
loop
exits
with
keys[m]
= P: return 6
unsuccessful
search
for
Q
5 9 7
A C E H L M P R S X
entry in red is a[m]
6 hi
6 m6
A C E H L M P R S X
lo
5 6 5
A C E H L M P R S X
exits with
keys[m]
P: return 6
entry
in red is=a[m]
unsuccessful
search4for Q A C E H L M P R loop
0
6 9
6 6
A C E H L M P R S
S X
X
lo
5 hi
9 form7Q
A C E H L M P R loop
S exits
X with keys[m] = P: return 6
unsuccessful
search
unsuccessful
search
for Q
0
A
5 9
6 4
5
A C
C E
E H
H L
L M
M P
P R
R S
S X
X
lo hi m
5
9
7
A
C
E
H
L
M
P
R
S
X
7
6
6
A
C
E
H
L
M
P
R
S
0 9 4
A C E H L M P R S X
X
5 6 5
A C E H L M P R S X
5 9 7 loop exits
A with
C Elo >H hi:
L return
M P7 R S X
7 6 6
A C E H L M P R S X
5 6 5
A C E H L M P R S X
Trace
of binary
searchreturn
for rank in an ordered array
loop
exits
7 6 6
A with
C Elo >H hi:
L M P7 R S X
0
equals()
Challenge. Efficient implementations of both search and insert.
1
2
3
loop
exitsof
with
lo >search
hi: return
Trace
binary
for rank7 in an ordered array
19
Trace of binary search for rank in an ordered array
20
Binary search: Java implementation
Elementary symbol tables: quiz 1
Implementing binary search was
public Value get(Key key)
{
if (isEmpty()) return null;
int i = rank(key);
if (i < N && keys[i].compareTo(key) == 0) return vals[i];
else return null;
}
number of keys < key
private int rank(Key key)
{
int lo = 0, hi = N-1;
while (lo <= hi)
{
int mid = lo + (hi - lo) / 2;
int cmp = key.compareTo(keys[mid]);
if
(cmp < 0) hi = mid - 1;
else if (cmp > 0) lo = mid + 1;
else if (cmp == 0) return mid;
}
return lo;
A.
Easier than I thought.
B.
About what I expected.
C.
Harder than I thought.
D.
Much harder than I thought.
E.
I don't know. (Well, you should!)
}
21
FIND THE FIRST 1
22
Binary search: trace of standard indexing client
Problem. To insert, need to shift all greater keys over.
Problem. Given an array with all 0s in the beginning and all 1s at the end,
find the index in the array where the 1s start.
key value
input
0
0
0
0
0
…
0
0
0
0
0
1
1
1
…
1
1
1
Variant 1. You are given the length of the array.
Variant 2. You are not given the length of the array.
0
1
2
3
keys[]
4 5 6
7
8
9
1
2
3
vals[]
4 5 6
N
0
1
0
2
3
4
1
2
2
0
1
1
0
3
0
2
2
4
4
1
1
3
5
0
3
0
S
0
S
E
A
R
1
2
3
E
A
A
S
E
E
S
R
S
C
H
4
5
A
A
C
C
E
E
R
H
S
R
S
entries in gray 5
did not move 6
E
X
6
7
A
A
C
C
E
E
H
H
R
R
S
S
X
6
7
2
2
4
4
6
6
5
5
3
3
0
0
A
M
8
9
A
A
C
C
E
E
H
H
R
M
S
R
X
S
X
7
8
8
8
4
4
6
6
5
5
3
9
0
3
P
L
E
10
11
12
A
A
A
C
C
C
E
E
E
H
H
H
M
L
L
P
M
M
R
P
P
S
R
R
X
S
S
X
X
9
10
10
8
8
8
A
C
E
H
L
M
P
R
S
X
8
entries in red
were inserted
7
8
9
entries in black
moved to the right
circled entries are
changed values
7
7
0
7
4 6
4 6
4 12
5 9 10 3
5 11 9 10
5 11 9 10
0
3
3
7
0
0
7
7
4 12
5 11
3
0
7
9 10
Trace of ordered-array ST implementation for standard indexing client
23
24
Elementary ST implementations: summary
guarantee
average case
operations
on keys
implementation
search
insert
search hit
insert
sequential search
(unordered list)
N
N
N
N
equals()
binary search
(ordered array)
log N
N
log N
N
compareTo()
3.1 S YMBOL T ABLES
‣ API
‣ elementary implementations
Algorithms
‣ ordered operations
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
Challenge. Efficient implementations of both search and insert.
25
Examples of ordered symbol table API
min()
get(09:00:13)
floor(09:05:00)
select(7)
keys(09:15:00, 09:25:00)
ceiling(09:30:00)
max()
Ordered symbol table API
keys
values
09:00:00
09:00:03
09:00:13
09:00:59
09:01:10
09:03:13
09:10:11
09:10:25
09:14:25
09:19:32
09:19:46
09:21:05
09:22:43
09:22:54
09:25:52
09:35:21
09:36:14
09:37:44
Chicago
Phoenix
Houston
Chicago
Houston
Chicago
Seattle
Seattle
Phoenix
Chicago
Chicago
Chicago
Seattle
Seattle
Chicago
Chicago
Seattle
Phoenix
public class ST<Key extends Comparable<Key>, Value>
...
smallest key
Key max()
largest key
Key floor(Key key)
Key ceiling(Key key)
smallest key greater than or equal to key
number of keys less than key
Key select(int k)
key of rank k
void deleteMin()
delete smallest key
void deleteMax()
delete largest key
Iterable<Key> keys()
Iterable<Key> keys(Key lo, Key hi)
27
largest key less than or equal to key
int rank(Key key)
int size(Key lo, Key hi)
size(09:15:00, 09:25:00) is 5
rank(09:10:25) is 7
Examples of ordered symbol-table operations
Key min()
number of keys between lo and hi
all keys, in sorted order
keys between lo and hi, in sorted order
28
Algorithms
Binary search: ordered symbol table operations summary
sequential
search
binary
search
search
N
log N
insert
N
N
min / max
N
1
R OBERT S EDGEWICK | K EVIN W AYNE
3.2 B INARY S EARCH T REES
‣ BSTs
‣ ordered operations
floor / ceiling
N
log N
rank
N
log N
select
N
1
ordered iteration
N log N
N
Algorithms
F O U R T H
‣ deletion
E D I T I O N
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
order of growth of the running time for ordered symbol table operations
29
Binary search trees
Definition. A BST is a binary tree in symmetric order.
root
a left link
A binary tree is either:
3.2 B INARY S EARCH T REES
a subtree
・Empty.
・Two disjoint binary trees (left and right).
right child
of root
‣ BSTs
null links
Anatomy of a binary tree
‣ ordered operations
Algorithms
‣ deletion
Symmetric order. Each node has a key,
and every node’s key is:
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
・Larger than all keys in its left subtree.
・Smaller than all keys in its right subtree.
parent of A and R
left link
of E
S
E
X
A
R
C
keys smaller than E
key
H
9
value
associated
with R
keys larger than E
Anatomy of a binary search tree
3
Binary search tree demo
Binary search tree demo
Search. If less, go left; if greater, go right; if equal, search hit.
Insert. If less, go left; if greater, go right; if null, insert.
successful search for H
insert G
S
S
E
X
A
E
R
C
X
A
H
R
C
H
G
M
M
4
BST representation in Java
5
BST implementation (skeleton)
Java definition. A BST is a reference to a root Node.
public class BST<Key extends Comparable<Key>, Value>
{
private Node root;
A Node is composed of four fields:
・A Key and a Value.
・A reference to the left and right subtree.
smaller keys
private class Node
{ /* see previous slide */
}
public void put(Key key, Value val)
{ /* see next slides */ }
larger keys
private class Node
{
private Key key;
private Value val;
private Node left, right;
public Node(Key key, Value val)
{
this.key = key;
this.val = val;
}
}
root of BST
public Value get(Key key)
{ /* see next slides */ }
BST
Node
key
left
BST with smaller keys
val
public void delete(Key key)
{ /* see next slides */ }
right
public Iterable<Key> iterator()
{ /* see next slides */ }
BST with larger keys
}
Binary search tree
Key and Value are generic types; Key is Comparable
6
7
BST search: Java implementation
BST insert
Get. Return value corresponding to given key, or null if no such key.
Put. Associate value with key.
inserting L
S
E
X
A
public Value get(Key key)
{
Node x = root;
while (x != null)
{
int cmp = key.compareTo(x.key);
if
(cmp < 0) x = x.left;
else if (cmp > 0) x = x.right;
else if (cmp == 0) return x.val;
}
return null;
}
R
・
・Key not in tree ⇒
H
C
Search for key, then two cases:
M
Key in tree ⇒ reset value.
P
search for L ends
at this null link
add new node.
S
E
X
A
R
H
C
M
create new node
P
L
S
E
X
A
R
H
C
M
reset links
on the way up
Cost. Number of compares = 1 + depth of node.
L
P
Insertion into a BST
8
9
best case
H
S
C
BST insert: Java implementation
Tree shape
Put. Associate value with key.
・Many BSTs correspond to same set of keys. typical case
case
search/insert
= 1 + depth of node.
・Number of compares for best
H
E
public void put(Key key, Value val)
{ root = put(root, key, val); }
concise, but tricky,
recursive code;
read carefully!
S
C
private Node put(Node x, Key key, Value val)
{
if (x == null) return new Node(key, val);
int cmp = key.compareTo(x.key);
if
(cmp < 0) x.left = put(x.left, key, val);
else if (cmp > 0) x.right = put(x.right, key, val);
else if (cmp == 0) x.val = val;
return x;
}
best case
R
E
X
A
typical case
X
A
R
H
S
A
worst case
C
BST possibilities
H
H
C
H
E
R
E
X
S
E
X
worst case
C
R
C
S
H
C
X
X
R
A
S
E
S
C
A
X
typical case
H
A
R
E
A
R
E
A
R
S
BottomAline. Tree
depends on order of X
insertion.
worstshape
case
Cost. Number of compares = 1 + depth of node.
C
E
10
BST possibilities
H
11
R
S
BST insertion: random order visualization
Binary search trees: quiz 1
Ex. Insert keys in random order.
In what order does the traverse(root) code print out the keys in the BST?
private void traverse(Node x)
{
if (x == null) return;
traverse(x.left);
StdOut.println(x.key);
traverse(x.right);
}
A.
A C E H M R S X
B.
A C E R H M X S
C.
S E A C R H M X
D.
C A M H R E X S
E.
S
E
X
A
R
C
H
M
I don't know.
12
13
Inorder traversal
Binary search trees: quiz 2
・Traverse left subtree.
・Enqueue key.
・Traverse right subtree.
What is the name of this sorting algorithm?
1. Shuffle the keys.
2. Insert the keys into a BST, one at a time.
public Iterable<Key> keys()
{
Queue<Key> q = new Queue<Key>();
inorder(root, q);
return q;
}
private void inorder(Node x, Queue<Key> q)
{
if (x == null) return;
inorder(x.left, q);
q.enqueue(x.key);
inorder(x.right, q);
}
3. Do an inorder traversal of the BST.
BST
key
left
val
right
smaller keys, in order
A.
Insertion sort.
B.
Mergesort.
C.
Quicksort.
D.
None of the above.
E.
I don't know.
BST with larger keys
BST with smaller keys
key
larger keys, in order
all keys, in order
Property. Inorder traversal of a BST yields keys in ascending order.
14
15
Correspondence between BSTs and quicksort partitioning
BSTs: mathematical analysis
Proposition. If N distinct keys are inserted into a BST in random order,
the expected number of compares for a search/insert is ~ 2 ln N.
Pf. 1–1 correspondence with quicksort partitioning.
P
H
T
D
O
E
A
S
U
I
Proposition. [Reed, 2003] If N distinct keys are inserted into a BST
in random order, the expected height is ~ 4.311 ln N.
Y
C
M
expected depth of
function-call stack in quicksort
How Tall is a Tree?
How Tall is a Tree?
Bruce Reed
L
Bruce Reed
CNRS, Paris, France
CNRS, Paris, France
reed@moka.ccr.jussieu.fr
reed@moka.ccr.jussieu.fr
ABSTRACT
purpose of this note to p
ABSTRACT
But… Worst-case height is N – 1.
Remark. Correspondence is 1–1 if array has no duplicate keys.
[ exponentially small chance when keys are inserted in random order ]
16
ST implementations: summary
guarantee
average case
implementation
search
search hit
insert
operations
on keys
3.2 B INARY S EARCH T REES
sequential search
(unordered list)
N
binary search
(ordered array)
log N
BST
insert
have:
purpose
this note
that
for /3 -- ½ + ~ 3,
we
Let H~ be the height of a random
binaryof search
tree toonprove
n
nodes.
Wesearch
show tree
that on
there
constants a = 4.31107...
Let H~ be the height of a random
binary
n existshave:
THEOREM 1. E(H~) =
n d / 3 = 1.95...
such that E(H~) = c~logn - / 3 1 o g l o g n +
nodes. We show that there existsaconstants
a = 4.31107...
.
We also
O(1).
1. E(H~) = ~ l o g nVar(Hn)
- / 3 1 o g l=o gO(1)
n + O(1)
and
a n d / 3 = 1.95... such that E(H~)O(1),
= c~logn
- / 3show
1 o g l othat
g n +Var(H~) =THEOREM
Var(Hn) = O(1) .
O(1), We also show that Var(H~) = O(1).
R e m a r k By the definitio
Categories and Subject Descriptors
given
is more
R e m a r k By the definition of a, nition
/3 = 7"g~"
3~ The
firstsugge
defi
Categories and Subject Descriptors
we this
will see.
E.2 [ D a t a S t r u c t u r e s ] : Trees nition given is more suggestive ofaswhy
value is correct,
For
more
information
on
as
we
will
see.
E.2 [ D a t a S t r u c t u r e s ] : Trees
consult
[6],[7],
[1],one
[2
For more information on randommay
binary
search
trees,
1. THE RESULTS
R eand
m a r[8].
k After I announce
may consult [6],[7], [1], [2], [9], [4],
A binary search tree is a binary tree to each node of which
1. THE RESULTS
an alternative p
R e m a r k After I announced these developed
results, Drmota(unpublish
we have
associated
key; these keys axe drawn from some
using
A binary search tree is a binary tree
to each
node of awhich
developed an alternative proof ofO(1)
the fact
thatcompletely
Var(Hn) d=
totally
set and
the key at v cannot be larger than
illuminate
we have associated a key; these keys
axeordered
drawn from
some
O(1) using completely different proofs
techniques.
our two
17 As different
at its
than the key at its left
submit we
thehave
jou
totally ordered set and the key atthev key
cannot
beright
largerchild
thannor smaller
proofs illuminate different aspectsdecided
of the to
problem,
child.
a binary
search tree T and a new key k, we
and asked
thatsame
theyjourna
be pu
the key at its right child nor smaller
thanGiven
the key
at its left
decided to submit the journal versions
to the
insert
k
into
T
by
traversing
the
tree
starting
at
the
root
child. Given a binary search tree T and a new key k, we
and asked that they be published side by side.
insert k into T by traversing theand
treeinserting
starting katinto
the the
rootfirst empty position at which we
2. A MODEL
arrive.
We traverse
the tree
and inserting k into the first empty
position
at which
we by moving to the left child of the
If we construct a binary
current
nodeleft
if child
k is smaller
the current
key and moving
A MODEL
arrive. We traverse the tree by moving
to the
of the than 2.
1, ...,
n and
i is the fir
to current
the right
otherwise. Given
permutation
If we some
construct
a binaryofsearch oftree
from
a permutation
current node if k is smaller than the
keychild
and moving
i
at the rootthen:
of
a
set
of
keys,
we
construct
a
binary
search
tree
from
this
of 1, ..., n and i is the first key inappears
the permutation
to the right child otherwise. Given some permutation of
left the
child
i contains
th
permutation
by inserting
the given
intoofanthe tree,
at order
the root
treeof rooted
at the
a set of keys, we construct a binary
search tree
from this them iinappears
depends
only
on its
theshape
orde
initially
empty
tree.
left
child
of
i
contains
the
keys
1,
...,
i
1
and
permutation by inserting them in the given order into an
the permutation,
mad thein
The height Hn of a random binary
search
T,~ order
on n in which
depends
only tree
on the
these keys appear
initially empty tree.
contains
theright
keys child
i + 1,of...,i
nodes,search
constructed
from amad
random
the starting
permutation,
the tree rooted
at the
The height Hn of a random binary
tree T,~inonthis
n manner
order in
which only
theseonk
equiprobable
permutation
. . , n, is known
the keystoi be
+ 1,close
..., n and the
its shape
depends
nodes, constructed in this manner
starting from
a random of 1 , .contains
Frominthis
a l oisg nknown
whereto abe= close
4.31107...the
is order
the unique
solution
in which
these on
keys appear
the observation,
permutation.one
equiprobable permutation of 1 , . . to
. , n,
ber
of
levels
of
recursion
[2,
~
)
of
the
equation
a
log((2e)/a)
=
1
(here
and
elsewhere,
From this observation, one deduces that Hn is also the numto a l o g n where a = 4.31107... is the unique solution on
(i.e.
the Vazfilla
version Quicksort
of Quick
log= is1 the
logarithm). First,
Pittel[10]
thatrequired
ber of
levels ofshowed
recursion
when
[2, ~ ) of the equation a log((2e)/a)
(herenatural
and elsewhere,
permuation
chosen in
a
H,~/log
n --~ 3'showed
almostthat
surely as (i.e.
n --+the
c~ version
for some
of positive
Quicksort in the
which
the first is
element
log is the natural logarithm). First,
Pittel[10]
permutation
n.
7. This
not to exceed
c~ [11],
the permuation
is chosen
as the pivot)
is appliedofto1,a ...,
random
H,~/log n --~ 3' almost surely as constant
n --+ c~ for
some constant
positive was known
Our observation also allow
and
Devroye[3]
showed
that
"7
=
a,
as
a
consequence
of
the
permutation
of
1,
...,
n.
constant 7. This constant was known not to exceed c~ [11],
To ease
our the
exposit
factas that
E ( H n ) ~ c~logn.
has found
that
Hn us to down.
Our observation
also
allows
construct
Tn from
top
and Devroye[3] showed that "7 = a,
a consequence
of the Robson[12]
a subtree
, the co
does not
much
to ease
experiment,
and
down. To
our exposition,
weofthink
of T,~of
as Ta~labelling
fact that E ( H n ) ~ c~logn. Robson[12]
hasvary
found
thatfrom
Hn experiment
We
will
expose
the
seems to
to have
a fixed range
not depending
n.
of a subtree
of T ~ , upon
the complete
infinite binary tree. key as
does not vary much from experiment
experiment,
and of width
Towith
underscore
the trelation
Devroye
and Reed[5]
proved
Var(Hn)
= O((log
will expose
thelog
keyn)2),
associated
each node
of T~.
seems to have a fixed range of width
not depending
upon
n. that We
the Quicksort,
key at t as we
therefer
pivottoa
but this =
does
not log
quite
confirm Robson's
findings.
is the
To underscore
the It
relationship
with
Devroye and Reed[5] proved that Var(Hn)
O((log
n)2),
exposed then
the pivots
forhave
som
the key at t as the pivot at t. Suppose
that we
but this does not quite confirm Robson's findings. It is the
Too, rooted
at athe
root
exposed the pivots for some of theof nodes
forming
subtree
some node further
t of T~¢,
all for
of
of of
Too,
at the
that
Permission to make digital or hard copies
all orrooted
part of this
workroot
for of T ~ . weSuppose
have of
chosen
somefeenode
t ofthat
T~¢,
all of the ancestors
t are their
in T,~pivot
and
or classroom
use for
is granted without
provided
copies
Permission to make digital or hard copiespersonal
of all or part
of this work
the
set
of
keys
Kt
which
w
we have chosen their pivots. Then, these choices determine
are fee
notprovided
made or distributed
personal or classroom use is granted without
that copiesfor profit or commercialadvantageand that
subtree
T,~ rootedempty)
at t, b
theon
set
keys
KtTowhich
at theof(possibly
copies bearadvantage
this noticeand
andthat
the full citation
theoffirst
page.
copy will appear
are not made or distributedfor profit or commercial
in
which
we
expect
the
ke
otherwise,
to page.
republish,
to post on servers
or to redistribute
to lists,
subtree
of T,~ rooted
at t, but will have no effect on the order
copies bear this notice and the full citation
on the first
To copy
permutation
Kt is each
equa
requires prior specific permissionand/orina fee.
which we expect the keys in Kt
to appear. ofIndeed
otherwise, to republish,to post on servers or to redistributeto lists,
in Kt will be equally likel
requires prior specific permissionand/or aSTOC
fee. 2000 Portland Oregon USApermutation of Kt is equally likely. Thus, each of the keys
keys
Copyright
ACM
2000
1-58113-184-4/00/5...$5.00
in Kt will be equally likely to bethe
thenumber
pivot. of
We
let in
nt this
be
STOC 2000 Portland Oregon USA
the number of keys in this set and specify the pivot at t by
Copyright ACM 2000 1-58113-184-4/00/5...$5.00
N
N
N
equals()
‣ BSTs
N
N
N
log N
log N
N
log N
‣ ordered operations
compareTo()
Algorithms
compareTo()
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
Why not shuffle to ensure a (probabilistic) guarantee of log N?
18
‣ deletion
479
479
Minimum and maximum
Floor and ceiling
Minimum. Smallest key in table.
Floor. Largest key ≤ a given key.
Maximum. Largest key in table.
Ceiling. Smallest key ≥ a given key.
S
min()
E
min
floor(G)
max()max
X
A
R
C
max()
S
min()
E
X
A
H
R
C
H
M
ceiling(Q)
M
floor(D)
Examples of BST order queries
Examples of BST order queries
Q. How to find the min / max?
Q. How to find the floor / ceiling?
20
Computing the floor
Computing the floor
finding
finding floor(S)
floor(S)
Floor. Find largest key ≤ k ?
Case 1. [ key in node x = k ]
The floor of k is in the left subtree of x.
S
S
finding floor(S)
E
E
A
A
E
C
AC
R
R
H
H
M
HM
M
C
The floor of k is k.
Case 2. [ key in node x > k ]
E
E
E
H
H
X
SX
X
R
R
if (cmp == 0) return x;
R
S
S is
is greater
greater than
than G
G so
so
M floor(G)
HM
floor(G) must
must be
be
S is greater
than G so
on
on the
the left
left
M floor(G)
must be
C
Case 3. [ key in node x < k ]
public Key floor(Key key)
{
Node x = floor(root, key);
if (x == null) return null;
return x.key;
}
private Node floor(Node x, Key key)
{
if (x == null) return null;
int cmp = key.compareTo(x.key);
S
S
finding floor(G)
C
AC
finding floor(G)
X
SX
X
R
finding
finding floor(G)
floor(G)
A
A
if (cmp < 0)
on the left
The floor of k can't be in left subtree of x:
it is either in the right subtree of x or
it is the key in node x.
21
A
A
C
AC
E
Node t = floor(x.right, key);
if (t != null) return t;
else
return x;
S
S
E
E
H
H
M
C so H M
E
is less
less than
than G
G so
E is
floor(G)
could
be
M
floor(G) could be
is less
Eon
the
right
on
the than
right G so
floor(G) could be
R
R
X
SX
X
X
A
R
H
C
M
G is less than S so
floor(G) must be
on the left
S
E
X
A
R
H
C
G is greater than E so
floor(G) could be
M
on the right
S
E
X
A
R
H
C
M
floor(G)in left
subtree is null
S
}
R
onComputing
the right the
Computing
the floor
floor function
function
return floor(x.left, key);
S
E
E
A
C
result
X
R
H
M
22
23
Computing the floor function
Rank and select
BST implementation: subtree counts
Q. How to implement rank() and select() efficiently?
private class Node
{
private Key key;
private Value val;
private Node left;
private Node right;
private int count;
}
A. In each node, store the number of nodes in its subtree.
subtree count
8
S
6
E
2
A
3
1
2
C
H
1
public int size()
{ return size(root);
}
private int size(Node x)
{
if (x == null) return 0;
ok to call
return x.count;
when x is null
}
number of nodes in subtree
X
initialize subtree
private Node put(Node x, Key key, Value val)
count to 1
{
if (x == null) return new Node(key, val, 1);
int cmp = key.compareTo(x.key);
if
(cmp < 0) x.left = put(x.left, key, val);
else if (cmp > 0) x.right = put(x.right, key, val);
else if (cmp == 0) x.val = val;
x.count = 1 + size(x.left) + size(x.right);
return x;
}
R
1
M
24
Computing the rank
Rank. How many keys < k ?
Rank
no key in right subtree < k.
Case 2. [ key in node x > k ]
Rank. How many keys < k ?
node count
8
6
S
Case 1. [ key in node = k ]
All keys in left subtree < k;
25
E
2
A
3
1
2
C
H
1
node count
8
6
S
Easy recursive algorithm (3 cases!)
X
E
2
R
A
1
M
3
1
2
C
H
1
X
R
1
M
No key in right subtree < k;
public int rank(Key key)
{ return rank(key, root);
recursively compute rank in left subtree.
Case 3. [ key in node x < k ]
}
private int rank(Key key, Node x)
{
if (x == null) return 0;
int cmp = key.compareTo(x.key);
if
(cmp < 0) return rank(key, x.left);
else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right);
else if (cmp == 0) return size(x.left);
}
All keys in left subtree < k;
some keys in right subtree may be < k.
26
27
BST: ordered symbol table operations summary
sequential
search
binary
search
BST
search
N
log N
h
insert
N
N
h
min / max
N
1
h
floor / ceiling
N
log N
h
rank
N
log N
h
select
N
1
h
N log N
N
N
ordered iteration
3.2 B INARY S EARCH T REES
h = height of BST
(proportional to log N
if keys inserted in random order)
‣ BSTs
‣ ordered operations
‣ deletion
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
order of growth of running time of ordered symbol table operations
28
ST implementations: summary
BST deletion: lazy approach
To remove a node with a given key:
guarantee
ordered
ops?
implementation
search
insert
・Set its value to null.
・Leave key in tree to guide search (but don't consider it equal in search).
average case
delete
search hit
insert
key
interface
delete
E
sequential search
(unordered list)
N
N
N
N
N
N
binary search
(ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
?
✔
compareTo()
E
equals()
A
S
A
S
delete I
C
H
☠
C
I
R
N
H
tombstone
R
N
Cost. ~ 2 ln N' per insert, search, and delete (if keys in random order),
where N' is the number of key-value pairs ever inserted in the BST.
Next. Deletion in BSTs.
Unsatisfactory solution. Tombstone (memory) overload.
30
31
Deleting the minimum
Hibbard deletion
To delete the minimum key:
To delete a node with key k: search for node t containing key k.
・Go left until finding a node with a null left link.
・Replace that node by its right link.
・Update subtree counts.
go left until
reaching null
left link
A
S
X
E
Case 0. [0 children] Delete t by setting parent link to null.
R
H
C
M
return that
node’s right link
R
E
M
available for
garbage collection
E
X
A
replace with
null link
node to delete
S
5
R
C
R
H
M
M
available for
garbage
collection
X
E
X
A
H
M
7
E
1
R
H
7
S
5
X
A
R
C
update links and node counts
after recursive calls
private Node deleteMin(Node x)
{
if (x.left == null) return x.right;
x.left = deleteMin(x.left);
x.count = 1 + size(x.left) + size(x.right);
return x;
}
S
S
H
C
}
update counts after
recursive calls
X
E
A
public void deleteMin()
{ root = deleteMin(root);
deleting C
S
C
H
M
Deleting the minimum in a BST
32
Hibbard deletion
33
Hibbard deletion
deleting E
node to delete
To delete a node with key k: search for node t containing key k.
To delete a node with key k: search for node
A
H
C
Case 1. [1 child] Delete t by replacing parent link.
S
tEcontainingX
R
M
Case 2. [2 children]
・Find successor x of t.
・Delete the minimum in t's right subtree.
A
C
・Put x in t's spot.
search for key E
t
deleting R
update counts after
recursive calls
S
S
E
X
A
R
C
H
M
node to delete
E
X
H
A
C
S
5
E
M
node to delete
X
H
A
C
S
E
X
A
M
E
H
available for
garbage
collection
M
x
search for key E
M
go right, then
go left until
reaching null
left link
x
34
t.left
A
successor
min(t.right)
H
X
deleteMin(t.right)
R
S
E
X
deleteMin(t.right)
M
7
S
5
H
A
X
R
C
S
E
min(t.right)
R
R
H
M
H
X
x
C
still a BST
successor
H
C
S
E
A
R
A
t
R
t.left
x has no left child
but
X don't garbage collect x
x
go right, then
go left until
reaching null
left link
R
C
replace with
child link
S
deleting E
7
key k.
M update links and
node counts after
recursive calls
Deletion in a BST
35
Hibbard deletion: Java implementation
Hibbard deletion: analysis
Unsatisfactory solution. Not symmetric.
public void delete(Key key)
{ root = delete(root, key);
}
private Node delete(Node x, Key key) {
if (x == null) return null;
int cmp = key.compareTo(x.key);
if
(cmp < 0) x.left = delete(x.left, key);
else if (cmp > 0) x.right = delete(x.right, key);
else {
if (x.right == null) return x.left;
if (x.left == null) return x.right;
search for key
no right child
no left child
Node t = x;
x = min(t.right);
x.right = deleteMin(t.right);
x.left = t.left;
replace with
successor
}
x.count = size(x.left) + size(x.right) + 1;
return x;
update subtree
counts
Surprising consequence. Trees not random (!) ⇒ √ N per op.
}
Longstanding open problem. Simple and efficient delete for BSTs.
36
ST implementations: summary
guarantee
average case
ordered
ops?
implementation
key
interface
search
insert
delete
search hit
insert
delete
sequential search
(unordered list)
N
N
N
N
N
N
binary search
(ordered array)
log N
N
N
log N
N
N
✔
compareTo()
BST
N
N
N
log N
log N
√N
✔
compareTo()
equals()
other operations also become √N
if deletions allowed
Next lecture. Guarantee logarithmic performance for all operations.
38
37