Algorithms R OBERT S EDGEWICK | K EVIN W AYNE Data structures “ Smart data structures and dumb code works a lot better than the other way around. ” – Eric S. Raymond 3.1 S YMBOL T ABLES ‣ API ‣ elementary implementations Algorithms F O U R T H ‣ ordered operations E D I T I O N R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu 2 Symbol tables Key-value pair abstraction. ・Insert a value with specified key. ・Given a key, search for the corresponding value. 3.1 S YMBOL T ABLES ‣ API Ex. DNS lookup. ・Insert domain name with specified IP address. ・Given domain name, find corresponding IP address. ‣ elementary implementations Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu ‣ ordered operations domain name IP address www.cs.princeton.edu 128.112.136.11 www.princeton.edu 128.112.128.15 www.yale.edu 130.132.143.21 www.harvard.edu 128.103.060.55 www.simpsons.com 209.052.165.60 key value 4 Symbol table applications Symbol tables: context Also known as: maps, dictionaries, associative arrays. application purpose of search key value dictionary find definition word definition book index find relevant pages term list of page numbers file share find song to download name of song computer ID financial account process transactions account number transaction details web search find relevant web pages keyword list of page names compiler find properties of variables variable name type and value routing table route Internet packets destination best route DNS find IP address domain name IP address reverse DNS find domain name IP address domain name genomics find markers DNA string known positions file system find file on disk filename location on disk Generalizes arrays. Keys need not be between 0 and N – 1. Language support. ・External libraries: C, VisualBasic, Standard ML, bash, ... ・Built-in libraries: Java, C#, C++, Scala, ... ・Built-in to language: Awk, Perl, PHP, Tcl, JavaScript, Python, Ruby, Lua. every array is an associative array every object is an associative array hasNiceSyntaxForAssociativeArrays["Python"] = True hasNiceSyntaxForAssociativeArrays["Java"] = False legal Python code 5 Basic symbol table API Java allows null value ・Values are not null. ・Method get() returns null if key not present. ・Method put() overwrites old value with new value. public class ST<Key, Value> void put(Key key, Value val) Value get(Key key) boolean contains(Key key) void delete(Key key) boolean isEmpty() int size() Iterable<Key> keys() 6 Conventions Associative array abstraction. Associate one value with each key. ST() table is the only primitive data structure create an empty symbol table put key-value pair into the table value paired with key Intended consequences. a[key] = val; ・Easy to implement contains(). a[key] is there a value paired with key? public boolean contains(Key key) { return get(key) != null; } remove key (and its value) from table is the table empty? ・Can implement lazy version of delete(). number of key-value pairs in the table all the keys in the table public void delete(Key key) { put(key, null); } 7 8 Keys and values Equality test Value type. Any generic type. All Java classes inherit a method equals(). specify Comparable in API. Key type: several natural assumptions. Java requirements. For any references x, y and z: ・Assume keys are Comparable, use compareTo(). ・Assume keys are any generic type, use equals() to test equality. ・Assume keys are any generic type, use equals() to test equality; ・Reflexive: ・Symmetric: ・Transitive: ・Non-null: use hashCode() to scramble key. x.equals(x) is true. x.equals(y) iff y.equals(x). equivalence relation if x.equals(y) and y.equals(z), then x.equals(z). x.equals(null) is false. built-in to Java (stay tuned) do x and y refer to the same object? Default implementation. (x == y) Best practices. Use immutable types for symbol table keys. Customized implementations. Integer, Double, String, java.io.File, … ・Immutable in Java: Integer, Double, String, java.io.File, … ・Mutable in Java: StringBuilder, java.net.URL, arrays, ... User-defined implementations. Some care needed. 9 10 Implementing equals for user-defined types Implementing equals for user-defined types Seems easy. Seems easy, but requires some care. public class Date implements Comparable<Date> { private final int month; private final int day; private final int year; ... public final class Date implements Comparable<Date> { private final int month; private final int day; private final int year; ... public boolean equals(Date that) { if (this.day != that.day ) return false; if (this.month != that.month) return false; if (this.year != that.year ) return false; return true; typically unsafe to use equals() with inheritance (would violate symmetry) public boolean equals(Object y) { if (y == this) return true; check that all significant fields are the same } must be Object. Why? Experts still debate. optimize for true object equality if (y == null) return false; check for null if (y.getClass() != this.getClass()) return false; objects must be in the same class (religion: getClass() vs. instanceof) Date that = (Date) y; if (this.day != that.day ) return false; if (this.month != that.month) return false; if (this.year != that.year ) return false; return true; cast is guaranteed to succeed check that all significant fields are the same } } } 11 12 Equals design ST test client for traces "Standard" recipe for user-defined types. Build ST by associating value i with ith string from standard input. ・Optimization for reference equality. ・Check against null. ・Check that two objects are of the same type; cast. ・Compare each significant field: but use Double.compare() with double – if field is a primitive type, use == (to deal with -0.0 and NaN) – if field is an object, use equals() apply rule recursively – if field is an array, apply to each entry can use Arrays.deepEquals(a, b) but not a.equals(b) public static void main(String[] args) { ST<String, Integer> st = new ST<String, Integer>(); for (int i = 0; !StdIn.isEmpty(); i++) keys S E A R C H { values 0 1 2 3 4 5 String key = StdIn.readString(); st.put(key, i); output for basic symbol table } (one possibility) for (String s : st.keys()) L 11 StdOut.println(s + " " + st.get(s)); } P 10 e.g., cached Manhattan distance Best practices. ・No need to use calculated fields that depend on other fields. ・Compare fields mostly likely to differ first. ・Make compareTo() consistent with equals(). keys 13 ST test client for analysis and print out one that occurs with highest frequency. % more it was it was it was it was it was it was it was it was it was it was tiny example (60 words, 20 distinct) % java FrequencyCounter 8 < tale.txt business 122 real example (135,635 words, 10,769 distinct) % java FrequencyCounter 10 < leipzig1M.txt government 24763 R C H E X A M values 0 1 2 3 4 5 6 7 8 9 10 11 12A L P M 11 10 9 X H 7 5 C 4 A 8 C 4 E H L M P 12 5 11 9 10 ER E S real example (21,191,455 words, 534,580 distinct) 15 X A M 6 7 8 9 10 11 12 P L output outputfor ordered symbol table A C 8 4 E H L M P R S X 12 5 11 9 10 3 0 7 Keys, values, and output for test client public class FrequencyCounter R 3 { 3 A 8 public static void main(String[] args) R { S 0 E 12 int minlen = Integer.parseInt(args[0]); X Integer>(); 7 S st0= new ST<String, ST<String, Integer> while (!StdIn.isEmpty()) Keys, values, and output for test client { String word = StdIn.readString(); ignore short strings if (word.length() < minlen) continue; if (!st.contains(word)) st.put(word, 1); else st.put(word, st.get(word) + 1); } String max = ""; st.put(max, 0); for (String word : st.keys()) if (st.get(word) > st.get(max)) max = word; StdOut.println(max + " " + st.get(max)); } } tinyTale.txt the best of times the worst of times the age of wisdom the age of foolishness the epoch of belief the epoch of incredulity the season of light the season of darkness the spring of hope the winter of despair % java FrequencyCounter 1 < tinyTale.txt it 10 4 3 8 12 0 A Frequency counter implementation Frequency counter. Read a sequence of strings from standard input C E output for ordered symbol table L 9 7 5 S output for basic symbol table (one possibility) x.equals(y) if and only if (x.compareTo(y) == 0) P M X H E 14 create ST read string and update frequency print a string with max freq 16 E Sequential search in a linked list Data structure. Maintain an (unordered) linked list of key-value pairs. Search. Scan through all keys until find a match. Insert. Scan through all keys until find a match; if no match add to front. 3.1 S YMBOL T ABLES key value first ‣ API ‣ elementary implementations ‣ ordered operations Algorithms S 0 S 0 red nodes are new E 1 E 1 S 0 A 2 A 2 E 1 S 0 R 3 R 3 A 2 E 1 S 0 C 4 C 4 R 3 A 2 E 1 S 0 H 5 H 5 C 4 R 3 A 2 E 1 S 0 E 6 H 5 C 4 R 3 A 2 E 6 S 0 black nodes are accessed in search circled entries are changed values X 7 X 7 H 5 C 4 R 3 A 2 E 6 S 0 R OBERT S EDGEWICK | K EVIN W AYNE A 8 X 7 H 5 C 4 R 3 A 8 E 6 S 0 http://algs4.cs.princeton.edu M 9 M 9 X 7 H 5 C 4 R 3 A 8 E 6 gray nodes are untouched S 0 P 10 P 10 M 9 X 7 H 5 C 4 R 3 A 8 E 6 S 0 L 11 L 11 P 10 M 9 X 7 H 5 C 4 R 3 A 8 E 6 S 0 E 12 L 11 P 10 M 9 X 7 H 5 C 4 R 3 A 8 E 12 S 0 Trace of linked-list ST implementation for standard indexing client 18 Elementary ST implementations: summary Binary search in an ordered array Data structure. Maintain an ordered array of key-value pairs. Rank helper function. How many keys < k ? guarantee average case operations on keys implementation sequential search (unordered list) search insert search hit insert N N N N keys[] 4 5 6 7 8 9 successful search for P keys[] lo hi m A C E H L M P R S X successful search for for P entries in black 0 4 M 5 6 successful 0search 9 4P A 1 C 2 E 3 H keys[] L P 7 R 8 S 9 X are a[lo..hi] lo hi m 5 9 7 A 0 C 1 E 2 H 3 L 4 M 5 P 6 R 7 S 8 X 9 successful search for P entries in black 0 A 5 9 6 4 5 A C C E E H H L L M M P P R R S S X X are a[lo..hi] lo hi m entry in red is a[m] 5 9 7 A C E H L M P R S X entries in black 6 A 0 6 9 6 4 A C C E E H H L L M M P P R R S S X X are a[lo..hi] 5 6 5 A C E H L M P R S X loop exits with keys[m] = P: return 6 unsuccessful search for Q 5 9 7 A C E H L M P R S X entry in red is a[m] 6 hi 6 m6 A C E H L M P R S X lo 5 6 5 A C E H L M P R S X exits with keys[m] P: return 6 entry in red is=a[m] unsuccessful search4for Q A C E H L M P R loop 0 6 9 6 6 A C E H L M P R S S X X lo 5 hi 9 form7Q A C E H L M P R loop S exits X with keys[m] = P: return 6 unsuccessful search unsuccessful search for Q 0 A 5 9 6 4 5 A C C E E H H L L M M P P R R S S X X lo hi m 5 9 7 A C E H L M P R S X 7 6 6 A C E H L M P R S 0 9 4 A C E H L M P R S X X 5 6 5 A C E H L M P R S X 5 9 7 loop exits A with C Elo >H hi: L return M P7 R S X 7 6 6 A C E H L M P R S X 5 6 5 A C E H L M P R S X Trace of binary searchreturn for rank in an ordered array loop exits 7 6 6 A with C Elo >H hi: L M P7 R S X 0 equals() Challenge. Efficient implementations of both search and insert. 1 2 3 loop exitsof with lo >search hi: return Trace binary for rank7 in an ordered array 19 Trace of binary search for rank in an ordered array 20 Binary search: Java implementation Elementary symbol tables: quiz 1 Implementing binary search was public Value get(Key key) { if (isEmpty()) return null; int i = rank(key); if (i < N && keys[i].compareTo(key) == 0) return vals[i]; else return null; } number of keys < key private int rank(Key key) { int lo = 0, hi = N-1; while (lo <= hi) { int mid = lo + (hi - lo) / 2; int cmp = key.compareTo(keys[mid]); if (cmp < 0) hi = mid - 1; else if (cmp > 0) lo = mid + 1; else if (cmp == 0) return mid; } return lo; A. Easier than I thought. B. About what I expected. C. Harder than I thought. D. Much harder than I thought. E. I don't know. (Well, you should!) } 21 FIND THE FIRST 1 22 Binary search: trace of standard indexing client Problem. To insert, need to shift all greater keys over. Problem. Given an array with all 0s in the beginning and all 1s at the end, find the index in the array where the 1s start. key value input 0 0 0 0 0 … 0 0 0 0 0 1 1 1 … 1 1 1 Variant 1. You are given the length of the array. Variant 2. You are not given the length of the array. 0 1 2 3 keys[] 4 5 6 7 8 9 1 2 3 vals[] 4 5 6 N 0 1 0 2 3 4 1 2 2 0 1 1 0 3 0 2 2 4 4 1 1 3 5 0 3 0 S 0 S E A R 1 2 3 E A A S E E S R S C H 4 5 A A C C E E R H S R S entries in gray 5 did not move 6 E X 6 7 A A C C E E H H R R S S X 6 7 2 2 4 4 6 6 5 5 3 3 0 0 A M 8 9 A A C C E E H H R M S R X S X 7 8 8 8 4 4 6 6 5 5 3 9 0 3 P L E 10 11 12 A A A C C C E E E H H H M L L P M M R P P S R R X S S X X 9 10 10 8 8 8 A C E H L M P R S X 8 entries in red were inserted 7 8 9 entries in black moved to the right circled entries are changed values 7 7 0 7 4 6 4 6 4 12 5 9 10 3 5 11 9 10 5 11 9 10 0 3 3 7 0 0 7 7 4 12 5 11 3 0 7 9 10 Trace of ordered-array ST implementation for standard indexing client 23 24 Elementary ST implementations: summary guarantee average case operations on keys implementation search insert search hit insert sequential search (unordered list) N N N N equals() binary search (ordered array) log N N log N N compareTo() 3.1 S YMBOL T ABLES ‣ API ‣ elementary implementations Algorithms ‣ ordered operations R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu Challenge. Efficient implementations of both search and insert. 25 Examples of ordered symbol table API min() get(09:00:13) floor(09:05:00) select(7) keys(09:15:00, 09:25:00) ceiling(09:30:00) max() Ordered symbol table API keys values 09:00:00 09:00:03 09:00:13 09:00:59 09:01:10 09:03:13 09:10:11 09:10:25 09:14:25 09:19:32 09:19:46 09:21:05 09:22:43 09:22:54 09:25:52 09:35:21 09:36:14 09:37:44 Chicago Phoenix Houston Chicago Houston Chicago Seattle Seattle Phoenix Chicago Chicago Chicago Seattle Seattle Chicago Chicago Seattle Phoenix public class ST<Key extends Comparable<Key>, Value> ... smallest key Key max() largest key Key floor(Key key) Key ceiling(Key key) smallest key greater than or equal to key number of keys less than key Key select(int k) key of rank k void deleteMin() delete smallest key void deleteMax() delete largest key Iterable<Key> keys() Iterable<Key> keys(Key lo, Key hi) 27 largest key less than or equal to key int rank(Key key) int size(Key lo, Key hi) size(09:15:00, 09:25:00) is 5 rank(09:10:25) is 7 Examples of ordered symbol-table operations Key min() number of keys between lo and hi all keys, in sorted order keys between lo and hi, in sorted order 28 Algorithms Binary search: ordered symbol table operations summary sequential search binary search search N log N insert N N min / max N 1 R OBERT S EDGEWICK | K EVIN W AYNE 3.2 B INARY S EARCH T REES ‣ BSTs ‣ ordered operations floor / ceiling N log N rank N log N select N 1 ordered iteration N log N N Algorithms F O U R T H ‣ deletion E D I T I O N R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu order of growth of the running time for ordered symbol table operations 29 Binary search trees Definition. A BST is a binary tree in symmetric order. root a left link A binary tree is either: 3.2 B INARY S EARCH T REES a subtree ・Empty. ・Two disjoint binary trees (left and right). right child of root ‣ BSTs null links Anatomy of a binary tree ‣ ordered operations Algorithms ‣ deletion Symmetric order. Each node has a key, and every node’s key is: R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu ・Larger than all keys in its left subtree. ・Smaller than all keys in its right subtree. parent of A and R left link of E S E X A R C keys smaller than E key H 9 value associated with R keys larger than E Anatomy of a binary search tree 3 Binary search tree demo Binary search tree demo Search. If less, go left; if greater, go right; if equal, search hit. Insert. If less, go left; if greater, go right; if null, insert. successful search for H insert G S S E X A E R C X A H R C H G M M 4 BST representation in Java 5 BST implementation (skeleton) Java definition. A BST is a reference to a root Node. public class BST<Key extends Comparable<Key>, Value> { private Node root; A Node is composed of four fields: ・A Key and a Value. ・A reference to the left and right subtree. smaller keys private class Node { /* see previous slide */ } public void put(Key key, Value val) { /* see next slides */ } larger keys private class Node { private Key key; private Value val; private Node left, right; public Node(Key key, Value val) { this.key = key; this.val = val; } } root of BST public Value get(Key key) { /* see next slides */ } BST Node key left BST with smaller keys val public void delete(Key key) { /* see next slides */ } right public Iterable<Key> iterator() { /* see next slides */ } BST with larger keys } Binary search tree Key and Value are generic types; Key is Comparable 6 7 BST search: Java implementation BST insert Get. Return value corresponding to given key, or null if no such key. Put. Associate value with key. inserting L S E X A public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; else if (cmp == 0) return x.val; } return null; } R ・ ・Key not in tree ⇒ H C Search for key, then two cases: M Key in tree ⇒ reset value. P search for L ends at this null link add new node. S E X A R H C M create new node P L S E X A R H C M reset links on the way up Cost. Number of compares = 1 + depth of node. L P Insertion into a BST 8 9 best case H S C BST insert: Java implementation Tree shape Put. Associate value with key. ・Many BSTs correspond to same set of keys. typical case case search/insert = 1 + depth of node. ・Number of compares for best H E public void put(Key key, Value val) { root = put(root, key, val); } concise, but tricky, recursive code; read carefully! S C private Node put(Node x, Key key, Value val) { if (x == null) return new Node(key, val); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val; return x; } best case R E X A typical case X A R H S A worst case C BST possibilities H H C H E R E X S E X worst case C R C S H C X X R A S E S C A X typical case H A R E A R E A R S BottomAline. Tree depends on order of X insertion. worstshape case Cost. Number of compares = 1 + depth of node. C E 10 BST possibilities H 11 R S BST insertion: random order visualization Binary search trees: quiz 1 Ex. Insert keys in random order. In what order does the traverse(root) code print out the keys in the BST? private void traverse(Node x) { if (x == null) return; traverse(x.left); StdOut.println(x.key); traverse(x.right); } A. A C E H M R S X B. A C E R H M X S C. S E A C R H M X D. C A M H R E X S E. S E X A R C H M I don't know. 12 13 Inorder traversal Binary search trees: quiz 2 ・Traverse left subtree. ・Enqueue key. ・Traverse right subtree. What is the name of this sorting algorithm? 1. Shuffle the keys. 2. Insert the keys into a BST, one at a time. public Iterable<Key> keys() { Queue<Key> q = new Queue<Key>(); inorder(root, q); return q; } private void inorder(Node x, Queue<Key> q) { if (x == null) return; inorder(x.left, q); q.enqueue(x.key); inorder(x.right, q); } 3. Do an inorder traversal of the BST. BST key left val right smaller keys, in order A. Insertion sort. B. Mergesort. C. Quicksort. D. None of the above. E. I don't know. BST with larger keys BST with smaller keys key larger keys, in order all keys, in order Property. Inorder traversal of a BST yields keys in ascending order. 14 15 Correspondence between BSTs and quicksort partitioning BSTs: mathematical analysis Proposition. If N distinct keys are inserted into a BST in random order, the expected number of compares for a search/insert is ~ 2 ln N. Pf. 1–1 correspondence with quicksort partitioning. P H T D O E A S U I Proposition. [Reed, 2003] If N distinct keys are inserted into a BST in random order, the expected height is ~ 4.311 ln N. Y C M expected depth of function-call stack in quicksort How Tall is a Tree? How Tall is a Tree? Bruce Reed L Bruce Reed CNRS, Paris, France CNRS, Paris, France reed@moka.ccr.jussieu.fr reed@moka.ccr.jussieu.fr ABSTRACT purpose of this note to p ABSTRACT But… Worst-case height is N – 1. Remark. Correspondence is 1–1 if array has no duplicate keys. [ exponentially small chance when keys are inserted in random order ] 16 ST implementations: summary guarantee average case implementation search search hit insert operations on keys 3.2 B INARY S EARCH T REES sequential search (unordered list) N binary search (ordered array) log N BST insert have: purpose this note that for /3 -- ½ + ~ 3, we Let H~ be the height of a random binaryof search tree toonprove n nodes. Wesearch show tree that on there constants a = 4.31107... Let H~ be the height of a random binary n existshave: THEOREM 1. E(H~) = n d / 3 = 1.95... such that E(H~) = c~logn - / 3 1 o g l o g n + nodes. We show that there existsaconstants a = 4.31107... . We also O(1). 1. E(H~) = ~ l o g nVar(Hn) - / 3 1 o g l=o gO(1) n + O(1) and a n d / 3 = 1.95... such that E(H~)O(1), = c~logn - / 3show 1 o g l othat g n +Var(H~) =THEOREM Var(Hn) = O(1) . O(1), We also show that Var(H~) = O(1). R e m a r k By the definitio Categories and Subject Descriptors given is more R e m a r k By the definition of a, nition /3 = 7"g~" 3~ The firstsugge defi Categories and Subject Descriptors we this will see. E.2 [ D a t a S t r u c t u r e s ] : Trees nition given is more suggestive ofaswhy value is correct, For more information on as we will see. E.2 [ D a t a S t r u c t u r e s ] : Trees consult [6],[7], [1],one [2 For more information on randommay binary search trees, 1. THE RESULTS R eand m a r[8]. k After I announce may consult [6],[7], [1], [2], [9], [4], A binary search tree is a binary tree to each node of which 1. THE RESULTS an alternative p R e m a r k After I announced these developed results, Drmota(unpublish we have associated key; these keys axe drawn from some using A binary search tree is a binary tree to each node of awhich developed an alternative proof ofO(1) the fact thatcompletely Var(Hn) d= totally set and the key at v cannot be larger than illuminate we have associated a key; these keys axeordered drawn from some O(1) using completely different proofs techniques. our two 17 As different at its than the key at its left submit we thehave jou totally ordered set and the key atthev key cannot beright largerchild thannor smaller proofs illuminate different aspectsdecided of the to problem, child. a binary search tree T and a new key k, we and asked thatsame theyjourna be pu the key at its right child nor smaller thanGiven the key at its left decided to submit the journal versions to the insert k into T by traversing the tree starting at the root child. Given a binary search tree T and a new key k, we and asked that they be published side by side. insert k into T by traversing theand treeinserting starting katinto the the rootfirst empty position at which we 2. A MODEL arrive. We traverse the tree and inserting k into the first empty position at which we by moving to the left child of the If we construct a binary current nodeleft if child k is smaller the current key and moving A MODEL arrive. We traverse the tree by moving to the of the than 2. 1, ..., n and i is the fir to current the right otherwise. Given permutation If we some construct a binaryofsearch oftree from a permutation current node if k is smaller than the keychild and moving i at the rootthen: of a set of keys, we construct a binary search tree from this of 1, ..., n and i is the first key inappears the permutation to the right child otherwise. Given some permutation of left the child i contains th permutation by inserting the given intoofanthe tree, at order the root treeof rooted at the a set of keys, we construct a binary search tree from this them iinappears depends only on its theshape orde initially empty tree. left child of i contains the keys 1, ..., i 1 and permutation by inserting them in the given order into an the permutation, mad thein The height Hn of a random binary search T,~ order on n in which depends only tree on the these keys appear initially empty tree. contains theright keys child i + 1,of...,i nodes,search constructed from amad random the starting permutation, the tree rooted at the The height Hn of a random binary tree T,~inonthis n manner order in which only theseonk equiprobable permutation . . , n, is known the keystoi be + 1,close ..., n and the its shape depends nodes, constructed in this manner starting from a random of 1 , .contains Frominthis a l oisg nknown whereto abe= close 4.31107...the is order the unique solution in which these on keys appear the observation, permutation.one equiprobable permutation of 1 , . . to . , n, ber of levels of recursion [2, ~ ) of the equation a log((2e)/a) = 1 (here and elsewhere, From this observation, one deduces that Hn is also the numto a l o g n where a = 4.31107... is the unique solution on (i.e. the Vazfilla version Quicksort of Quick log= is1 the logarithm). First, Pittel[10] thatrequired ber of levels ofshowed recursion when [2, ~ ) of the equation a log((2e)/a) (herenatural and elsewhere, permuation chosen in a H,~/log n --~ 3'showed almostthat surely as (i.e. n --+the c~ version for some of positive Quicksort in the which the first is element log is the natural logarithm). First, Pittel[10] permutation n. 7. This not to exceed c~ [11], the permuation is chosen as the pivot) is appliedofto1,a ..., random H,~/log n --~ 3' almost surely as constant n --+ c~ for some constant positive was known Our observation also allow and Devroye[3] showed that "7 = a, as a consequence of the permutation of 1, ..., n. constant 7. This constant was known not to exceed c~ [11], To ease our the exposit factas that E ( H n ) ~ c~logn. has found that Hn us to down. Our observation also allows construct Tn from top and Devroye[3] showed that "7 = a, a consequence of the Robson[12] a subtree , the co does not much to ease experiment, and down. To our exposition, weofthink of T,~of as Ta~labelling fact that E ( H n ) ~ c~logn. Robson[12] hasvary found thatfrom Hn experiment We will expose the seems to to have a fixed range not depending n. of a subtree of T ~ , upon the complete infinite binary tree. key as does not vary much from experiment experiment, and of width Towith underscore the trelation Devroye and Reed[5] proved Var(Hn) = O((log will expose thelog keyn)2), associated each node of T~. seems to have a fixed range of width not depending upon n. that We the Quicksort, key at t as we therefer pivottoa but this = does not log quite confirm Robson's findings. is the To underscore the It relationship with Devroye and Reed[5] proved that Var(Hn) O((log n)2), exposed then the pivots forhave som the key at t as the pivot at t. Suppose that we but this does not quite confirm Robson's findings. It is the Too, rooted at athe root exposed the pivots for some of theof nodes forming subtree some node further t of T~¢, all for of of of Too, at the that Permission to make digital or hard copies all orrooted part of this workroot for of T ~ . weSuppose have of chosen somefeenode t ofthat T~¢, all of the ancestors t are their in T,~pivot and or classroom use for is granted without provided copies Permission to make digital or hard copiespersonal of all or part of this work the set of keys Kt which w we have chosen their pivots. Then, these choices determine are fee notprovided made or distributed personal or classroom use is granted without that copiesfor profit or commercialadvantageand that subtree T,~ rootedempty) at t, b theon set keys KtTowhich at theof(possibly copies bearadvantage this noticeand andthat the full citation theoffirst page. copy will appear are not made or distributedfor profit or commercial in which we expect the ke otherwise, to page. republish, to post on servers or to redistribute to lists, subtree of T,~ rooted at t, but will have no effect on the order copies bear this notice and the full citation on the first To copy permutation Kt is each equa requires prior specific permissionand/orina fee. which we expect the keys in Kt to appear. ofIndeed otherwise, to republish,to post on servers or to redistributeto lists, in Kt will be equally likel requires prior specific permissionand/or aSTOC fee. 2000 Portland Oregon USApermutation of Kt is equally likely. Thus, each of the keys keys Copyright ACM 2000 1-58113-184-4/00/5...$5.00 in Kt will be equally likely to bethe thenumber pivot. of We let in nt this be STOC 2000 Portland Oregon USA the number of keys in this set and specify the pivot at t by Copyright ACM 2000 1-58113-184-4/00/5...$5.00 N N N equals() ‣ BSTs N N N log N log N N log N ‣ ordered operations compareTo() Algorithms compareTo() R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu Why not shuffle to ensure a (probabilistic) guarantee of log N? 18 ‣ deletion 479 479 Minimum and maximum Floor and ceiling Minimum. Smallest key in table. Floor. Largest key ≤ a given key. Maximum. Largest key in table. Ceiling. Smallest key ≥ a given key. S min() E min floor(G) max()max X A R C max() S min() E X A H R C H M ceiling(Q) M floor(D) Examples of BST order queries Examples of BST order queries Q. How to find the min / max? Q. How to find the floor / ceiling? 20 Computing the floor Computing the floor finding finding floor(S) floor(S) Floor. Find largest key ≤ k ? Case 1. [ key in node x = k ] The floor of k is in the left subtree of x. S S finding floor(S) E E A A E C AC R R H H M HM M C The floor of k is k. Case 2. [ key in node x > k ] E E E H H X SX X R R if (cmp == 0) return x; R S S is is greater greater than than G G so so M floor(G) HM floor(G) must must be be S is greater than G so on on the the left left M floor(G) must be C Case 3. [ key in node x < k ] public Key floor(Key key) { Node x = floor(root, key); if (x == null) return null; return x.key; } private Node floor(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key); S S finding floor(G) C AC finding floor(G) X SX X R finding finding floor(G) floor(G) A A if (cmp < 0) on the left The floor of k can't be in left subtree of x: it is either in the right subtree of x or it is the key in node x. 21 A A C AC E Node t = floor(x.right, key); if (t != null) return t; else return x; S S E E H H M C so H M E is less less than than G G so E is floor(G) could be M floor(G) could be is less Eon the right on the than right G so floor(G) could be R R X SX X X A R H C M G is less than S so floor(G) must be on the left S E X A R H C G is greater than E so floor(G) could be M on the right S E X A R H C M floor(G)in left subtree is null S } R onComputing the right the Computing the floor floor function function return floor(x.left, key); S E E A C result X R H M 22 23 Computing the floor function Rank and select BST implementation: subtree counts Q. How to implement rank() and select() efficiently? private class Node { private Key key; private Value val; private Node left; private Node right; private int count; } A. In each node, store the number of nodes in its subtree. subtree count 8 S 6 E 2 A 3 1 2 C H 1 public int size() { return size(root); } private int size(Node x) { if (x == null) return 0; ok to call return x.count; when x is null } number of nodes in subtree X initialize subtree private Node put(Node x, Key key, Value val) count to 1 { if (x == null) return new Node(key, val, 1); int cmp = key.compareTo(x.key); if (cmp < 0) x.left = put(x.left, key, val); else if (cmp > 0) x.right = put(x.right, key, val); else if (cmp == 0) x.val = val; x.count = 1 + size(x.left) + size(x.right); return x; } R 1 M 24 Computing the rank Rank. How many keys < k ? Rank no key in right subtree < k. Case 2. [ key in node x > k ] Rank. How many keys < k ? node count 8 6 S Case 1. [ key in node = k ] All keys in left subtree < k; 25 E 2 A 3 1 2 C H 1 node count 8 6 S Easy recursive algorithm (3 cases!) X E 2 R A 1 M 3 1 2 C H 1 X R 1 M No key in right subtree < k; public int rank(Key key) { return rank(key, root); recursively compute rank in left subtree. Case 3. [ key in node x < k ] } private int rank(Key key, Node x) { if (x == null) return 0; int cmp = key.compareTo(x.key); if (cmp < 0) return rank(key, x.left); else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right); else if (cmp == 0) return size(x.left); } All keys in left subtree < k; some keys in right subtree may be < k. 26 27 BST: ordered symbol table operations summary sequential search binary search BST search N log N h insert N N h min / max N 1 h floor / ceiling N log N h rank N log N h select N 1 h N log N N N ordered iteration 3.2 B INARY S EARCH T REES h = height of BST (proportional to log N if keys inserted in random order) ‣ BSTs ‣ ordered operations ‣ deletion Algorithms R OBERT S EDGEWICK | K EVIN W AYNE http://algs4.cs.princeton.edu order of growth of running time of ordered symbol table operations 28 ST implementations: summary BST deletion: lazy approach To remove a node with a given key: guarantee ordered ops? implementation search insert ・Set its value to null. ・Leave key in tree to guide search (but don't consider it equal in search). average case delete search hit insert key interface delete E sequential search (unordered list) N N N N N N binary search (ordered array) log N N N log N N N ✔ compareTo() BST N N N log N log N ? ✔ compareTo() E equals() A S A S delete I C H ☠ C I R N H tombstone R N Cost. ~ 2 ln N' per insert, search, and delete (if keys in random order), where N' is the number of key-value pairs ever inserted in the BST. Next. Deletion in BSTs. Unsatisfactory solution. Tombstone (memory) overload. 30 31 Deleting the minimum Hibbard deletion To delete the minimum key: To delete a node with key k: search for node t containing key k. ・Go left until finding a node with a null left link. ・Replace that node by its right link. ・Update subtree counts. go left until reaching null left link A S X E Case 0. [0 children] Delete t by setting parent link to null. R H C M return that node’s right link R E M available for garbage collection E X A replace with null link node to delete S 5 R C R H M M available for garbage collection X E X A H M 7 E 1 R H 7 S 5 X A R C update links and node counts after recursive calls private Node deleteMin(Node x) { if (x.left == null) return x.right; x.left = deleteMin(x.left); x.count = 1 + size(x.left) + size(x.right); return x; } S S H C } update counts after recursive calls X E A public void deleteMin() { root = deleteMin(root); deleting C S C H M Deleting the minimum in a BST 32 Hibbard deletion 33 Hibbard deletion deleting E node to delete To delete a node with key k: search for node t containing key k. To delete a node with key k: search for node A H C Case 1. [1 child] Delete t by replacing parent link. S tEcontainingX R M Case 2. [2 children] ・Find successor x of t. ・Delete the minimum in t's right subtree. A C ・Put x in t's spot. search for key E t deleting R update counts after recursive calls S S E X A R C H M node to delete E X H A C S 5 E M node to delete X H A C S E X A M E H available for garbage collection M x search for key E M go right, then go left until reaching null left link x 34 t.left A successor min(t.right) H X deleteMin(t.right) R S E X deleteMin(t.right) M 7 S 5 H A X R C S E min(t.right) R R H M H X x C still a BST successor H C S E A R A t R t.left x has no left child but X don't garbage collect x x go right, then go left until reaching null left link R C replace with child link S deleting E 7 key k. M update links and node counts after recursive calls Deletion in a BST 35 Hibbard deletion: Java implementation Hibbard deletion: analysis Unsatisfactory solution. Not symmetric. public void delete(Key key) { root = delete(root, key); } private Node delete(Node x, Key key) { if (x == null) return null; int cmp = key.compareTo(x.key); if (cmp < 0) x.left = delete(x.left, key); else if (cmp > 0) x.right = delete(x.right, key); else { if (x.right == null) return x.left; if (x.left == null) return x.right; search for key no right child no left child Node t = x; x = min(t.right); x.right = deleteMin(t.right); x.left = t.left; replace with successor } x.count = size(x.left) + size(x.right) + 1; return x; update subtree counts Surprising consequence. Trees not random (!) ⇒ √ N per op. } Longstanding open problem. Simple and efficient delete for BSTs. 36 ST implementations: summary guarantee average case ordered ops? implementation key interface search insert delete search hit insert delete sequential search (unordered list) N N N N N N binary search (ordered array) log N N N log N N N ✔ compareTo() BST N N N log N log N √N ✔ compareTo() equals() other operations also become √N if deletions allowed Next lecture. Guarantee logarithmic performance for all operations. 38 37
© Copyright 2024