Binary Search Trees
-------------------
You have probably seen binary search trees before. You should
understand what they are. Binary search trees is a class of data
structures where
(1) Each node stores a piece of data
(2) Each node has two pointers to two other binary search
trees
(3) The overall structure of the pointers is a tree
(there's a root, it's acyclic, and every node is
reachable from the root.)
Binary search trees are a way to store a update a set of items, where
there is an ordering on the items. I know this is rather vague. But
there is not a precise way to define the gamut of applicatons of search
trees. In general, there are two classes of applications. Those where
each item has a key value from a totally ordered universe, and those
where the tree is used as an efficient way to represent an ordered list
of items.
Some applications of binary search trees:
Storing a set of names, and being able to lookup based on a prefix of
the name. (Used in internet routers.)
Storing a path in a graph, and being able to reverse any subsection
of the path in O(log n) time. (Useful in travelling salesman
problems).
Being able to quickly determine the rank of a given node in a set.
A _symmetric order_ traversal of the tree is defined recursively as follows:
symmetric_order(r) {
if r = NIL then return
symmetric_order(left(r))
print r
symmetric_order(right(r))
}
Commonly, the item stored at a node is called a key. The keys are
commonly stored in the nodes such that a symmetric order traversal of
the tree visits the keys in increasing order. In this case the
following recursive algorithm can search for key k in a tree with root
r.
search(r, k) {
if r = NIL then return NIL
if key(r) = k then return r
if key(r) < k return search(left(r),k)
return search(right(r),k)
}
The generic insertion algorithm works as follows. To insert a key k,
first apply the search algorithm for key k. Assuming k is not in the
tree then eventually a NIL pointer is reached. This nil pointer is
modified to point at a new node containing the key inserted. The
generic deletion algorithm is a bit more complicated, and we won't
describe it here.
Given a set of n keys to store in a tree, there are many different
shaped trees that can be used to store those keys. If the tree is
_balanced_ then all the nodes will have a depth (distance from the root)
of O(log n). Thus, one goal of the binary tree data structure is to
allow all the operations to be carried out in O(log n) time.
Unfortunately, the generic insertion and search procedures do not do
this. For example, if the keys are inserted in sorted order, then tree
is very unbalanced, and searching ends up taking Omega(n) time instead
of O(log n).
A number of different solutions have been devised to deal with this
problem. For example: AVL Trees, red-black trees, self-adjusting trees
(splay trees) and treaps.
All of these methods make use of a local restructuring operation called
a rotation. A rotation in a tree is a local restructuring that
preserves the symmetric order. Here's a general picture:
x y
/ \ / \
A y left rotation --> x C
/ \ <-- right rotation / \
B C A B
Here the upper case letters represent arbitrary (possibly empty)
subtrees. Here's a specific example:
1 2
/ \ left rotation at 1 / \
0 2 -------------------> 1 4
\ / / \
4 0 3 5
/ \
3 5
Treaps are a randomized algorithm that ensures that at all times the
tree is structured "randomly", according to an appropriate definition of
random. lect0118-treaps describes and analyzes treaps.