Return to the Lecture Notes Index

15-200 Lecture 20 (Friday, October 27, 2006)

Binary Search Trees

Today we talked about a very important type of tree, the Binary Search Tree. In a BST, the parent-child relationship states that the left child contains a smaller value than the parent, and the right child contains a larger value than the parent.

Left Child < Parent < Right Child

This relationship will give us a complete ordering of the elements. That is, given a Tree containing some set of elements, an inorder traversal of the Tree will ALWAYS walk through the nodes in the same order, even if they are arranged differently.

For example, given the numbers 1 through 7, lets look at two different valid BSTs.

4 2 / \ / \ 2 6 1 4 / \ / \ / \ 1 3 5 7 3 6 / \ 5 7

For the left tree, our traversal will first go left. Now, it must traverse the subtree containing 1,2 and 3. To traverse this, it will first visit the left child (1), then visit the parent (2), and finally the right child (3). Next, we go back to the root (4). Now we go to the right subtree containing 5,6,7. First, we visit the left child (5), then the parent (6), and finally the right child (7). So the inorder traversal is 1,2,3,4,5,6,7.

For the right tree, we first visit the left child (1). Then we go back to the root(2). Now we visit the right subtree, whose root is 4. So we first search 4's left child (3), then we go back and visit (4), and finally we search the right subtree, which like in the left tree, will be in the order 5,6,7. So the traversal is again 1,2,3,4,5,6,7.

The moral of the story is that the parent-child relationship in a binary search tree defines a complete ordering. Any inorder traversal of a BST will traverse the Nodes in order. In the case of integers, this will probably be from smallest to largest, or if you have strings, maybe it will traverse them in alphabetical order, or in order by length of the string, depending on how you define your order.

Building a Binary Search Tree

Now lets talk about how to build a binary search tree given a list of elements. First, we note that an empty tree and a tree with one element are both valid binary trees. Without more than one item, it is impossible to violate the parent child relationship.

Now, say we have a tree with one element, and we want to insert a second element. We simply compare the new element with the value stored in the root, and then we have to choices. Either we place it in the left subtree, or we place it in the right subtree. In other words, if the new item is less than the item in the root, we insert the new item in the left subtree. Otherwise, we insert into the right subtree. We can do this recursively, where our base case is inserting into an empty tree, which is simple.

Lets try this with the sequence of numbers 4,2,6,1,3,7,5.

We first insert the 4 into the empty tree. Next, we insert 2. Since 2 < 4, we insert it into the left subtree. Since the left subtree is empty, 2 becomes the new left subtree.

	
		4
	      /
	    2

Next, we insert 6 into the tree. Since 6 > 4, we insert into the right subtree.

		4
	      /  \
            2      6

Now, we insert 1. Since 1 < 4, we insert into the left subtree. Since 1 < 2, we insert into 2's left subtree.

		4
              /   \
            2       6
          /
        1

Next, lets insert 3. Since 3 < 4, we insert into the left subtree. Since 3 > 2, we insert into 2's right subtree.

		4
	      /   \
            2       6
          /  \
         1    3

Using the same method, we insert the 5 and 7, to get the tree.

		4
             /    \
            2      6
          /  \   /  \
         1    3 5    7

This is a pretty nice looking tree. It has no holes in it, and it branches out at every possible oppertunity. We call this a balanced tree. Notice that the depth of this tree is very small.

What if we insert the objects in the order 1,2,3,4,5,6,7? The result will still be a valid BST, but it will look very different. First, we insert 1 as the root. Then we add 2 to the right subtree.

	1
	 \
	  2

Now, we insert 3. Since 3 > 1, we insert into the right subtree. Since 3 > 2, we insert into 2's right subtree.

	1
	 \
	  2
	   \
	    3

You can probably see where this is going. Each new element will be added in a new level, all the way on the right. Our final tree will look like this.

 1
  \
   2
    \
     3
      \
       4
        \
         5
          \
           6
            \
             7
    - or -

1 - 2 - 3 - 4 - 5 - 6 - 7

This actually looks an awful lot like a linked list. The result is that when our tree looks like this, we get no benefit from using a BST. The depth is the same as the number of elements in the tree. We call this a degenerate tree.

Searching a Binary Tree

Lets say we want to search a binary tree. This is easy, since given an item to search for, we can just compare it to the root and we instantly know one of three things. 1.) We found what we're looking for, 2.) The object we're looking for is in the left subtree, or 3.) The object we're looking for is in the right subtree, if its there at all.

public boolean binarySearch(TreeNode current, Comparable target){
	if(target.compareTo(current.getValue() < 0)
		return binarySearch(current.getLeftSubtree(), target);
	if(target.compareTo(current.getValue() > 0)
		return binarySearch(current.getRightSubtree(), target);
	else
		return true;  // compareTo returned 0! We found it!
}

This means we only need to travel to the bottom of the tree once. We can do this because we have the assurance that all objects in a Nodes left subtree are smaller than the parent, and all objects in the right subtree are larger than the parent. So if we're looking for something that is larger than the parent, we know for certain that it CAN'T be found in the left subtree.

This means that our search time is proportional to the maximum depth of the tree. What is this depth? The answer is it depends on what order they were inserted. In the worst case, which is the degerenarate tree above, the search is linear, or O(n). There is only one path to the bottom of the tree, and it is of length n. If we want to search for 7, we would have to traverse all the way down all 7 levels until we find it.

However, in our balanced tree, we only need to check 3 levels to find the 7. So what is the runtime of the search if we have a balanced tree? Well, each level of depth can hold twice as many elements as the previous.

The root level contains 1 item. The second level contains 2 items. The third level contains 4 items.

		50     	         Level Capacity: 1 = 2^0 Total Capacity = 2^0 = 2^1 - 1 
              /    \
            25      75           LevelCapacity: 2 = 2^1  Total Capacity = 3 = 2^2 - 1
         /    \    /   \
        10    32  62    80       Level Capacity: 4 = 2^2 Total Capacity = 7 = 2^3 - 1
      /  \   / \  / \   /  \
     5   12 26 35 55 65 77  85   Level Capacity: 8 = 2^3 Total Capacity = 15 = 2^4 - 1

As you can see, the total capacity of a balanced tree is equal to 2^height - 1. For the purposes of runtime analysis, we'll forget about the -1. So for a balanced BST containing n objects, the height h is related to it by the formula...

 

n = 2^h

If we take the log of both sides, we get the property that

 h = log(n) 

Since we determined above that the search was proportional to the height of the BST, a search in balanced tree will be O(h), or O(log(n)). However, as we saw earlier, for degenerate trees, the search will be O(n). Luckily, randomly created trees have a tendancy to be roughly balanced on average, so in general, searches run in log(n) time. But be aware that big O notation refers to the worst case scenario, so for a Binary Search, it is still O(n).