Return to the Lecture Notes Index

15-200 Lecture 21 (Monday, March 20, 2006)

Trees

Today we discussed a new type of data structure, the Tree. Like the linked list, a Tree is a collection of node objects connected together by references. However, unlike a linked list, which contains only a single Node reference to the next element in the list, each Node in a Tree can have references to multiple other Nodes. The result is a structure that branches out like a tree.

In this example, Node B has references to both D and E. We call D and E the "children" of Node B. B is therefor refered to as the "parent" of D and E. Every Node in the tree except one has a parent. This special parentless Node is called the "root" of the tree. Note that the Nodes at the bottom of the Tree do not have any children. These Nodes are called "leaf" Nodes. In this example, D,E,F and G are all leaves. Finally, Nodes that have both a parent and at least one child are refered to as "interior nodes". B and C are both interior nodes.

Although Tree nodes can have any number of children, the above example is a special type of tree called a Binary Tree. A Binary Tree is a tree in which each Node can have at most 2 children. We spend the rest of class discussing a special type of Binary Tree called a Heap.

Heaps

A heap is a tree structure where the relationship is between the parent and its children. In a min-heap, the parent always has a smaller value than its children (there is no ordering among the children themselves), so the minimum value in the tree will be at the root. In a max-heap, the parent always has a greater value than its children, so the maximum value in the tree will always be at the root. This property allows us to efficiently implement a priority queue, which is a queue where the item with the highest priority is dequeued first. (A normal FIFO queue is like a priority queue where items that were entered first have higher priority.)

In addition to the relationship between the parent and its children, we will impose another property on heaps. Heaps will always have to be full trees -- that is, there must be no holes at any depth except for the greatest depth, and all of the holes at the greatest depth are to the right. (That is, we will fill the tree from top to bottom, and from left to right.) This restriction on the shape of the tree will make our implementation easier and more efficient, as we will see later.

Inserting Into A Heap

When we insert into a heap we need to make sure that the resulting structure is still a heap. So the resulting structure needs to maintain the relationship between each node and its children, and the resulting tree cannot have any holes in it anywhere except for the bottom right of the tree. We will always initially use the first available spot (the leftmost open spot at the bottom of the tree) to initially insert the new item into the heap, so the tree will continue to be full after we insert. But will it still be a heap?

Let's look at an example of constructing a min-heap:

We will begin by inserting 70 into the heap. Since this is the first item in the heap, it will have to be the root.

Next, we will insert 150 into the heap. Since the tree is full at depth 1 (the root), we will insert it in the leftmost spot at depth 2, which in this case is the root's left child. Since 70 < 150, this is still a min-heap.

Now we will insert 110 into the heap. The next available spot to add an item to the heap is roots right child, so we will add 110 there. This is also still a min-heap because 70 < 110. (Remember, it does not matter how the siblings relate to each other.)

Next, we will insert 80 into the heap. Since the tree is now full at depth 2, we will add 80 at the leftmost spot at depth 3 (150's left child). This presents a problem, though, because 150 > 80, so we have a heap, but it is no longer a min-heap.

How do we make this a min- heap again? Whenever we insert, we have to look at the new node's parent and check the relationship. If the new node is smaller than its parent (in a min-heap), then we will swap the two values. After we have done this, we then have to check if the new one is smaller than its new parent. We continue this process of checking and swapping until we either hit the root or reach a spot where the new value is greater than its parent, because in either case we will have a min-heap again.

So, when we insert 80, we will have to swap it with the 150 since 150 > 80. 80's parent will now be 70, so we will stop there since 70 < 80.

Now let's add 30 to the heap. This will start in the second spot at depth 3 (80's right child), and then will have to swap with 80 because 80 > 30, and then will also swap with 70 because 70 > 30, so here 30 will become the new root of the heap. This is consistent with our idea of a min-heap, since 30 is now the smallest value in the heap.

Finally, we will add 10 to the heap. It will start in the third spot at depth 3 (110's left child), and we will have to swap it with 110 because 110 > 10, and then we will also have to swap it with 30 because 30 > 10, so now 10 becomes the root of the heap.

So, when we insert something into the tree, we initially place the item in the bottom, leftmost spot in the tree, and then swap the new value with its parent until we once again satisfy the properties of the heap.

Removing From A Heap

When we remove an item from the heap, we will always take the value at the root. Why? The reason we use a heap is to have easy access to the item that has the highest or lowest value in the current set of items. Like the stack and the queue, limiting the possible behavior of the heap ensures that it will behave consistently.

When we remove the item at the top of the heap, we leave an empty spot. Since the heap cannot have any holes in it except at the bottom-right, we no longer have a heap, so what can we do to reestablish a heap?

Our first concern is that the root of the heap gets the lowest value in the tree. Luckily, we can narrow our search for that value to the root's children. Why can we do this? Well, we know that in a min-heap the parent has a smaller value than its children, so all of the values below root's left child must have a greater value than root's left child, and all of the values below root's right child must have a greater value than root's right child. This means that the minimum value in the heap will be either root's left child or root's right child. That means we can promote the minimum of root's children to the root of the heap.

In this case, that means that 30, the right child of the root, will be moved to the top of the heap.

Now there is a hole where the 30 was, so we will again have to move one of its children up the heap. Since there is only a left child, 110, we will move that to where 30 was.

We are left with a tree that satisfies the properties of the heap, so we are done. Unfortunately, it is not always this easy. Suppose we have the following heap:

If we remove the minimum value, the 70, we will replace it with the 80, and then replace the 80 with the 130. That would leave us with:

The tree we are left with is not a heap, because there is a hole on the bottom left of the tree. So what should we do in this situation? Well, if we are going to have holes in the tree, they should be at the bottom of the tree and to the right of all the values. To fill in this hole, then, we will take the last value in the heap (the bottom rightmost value) and move it to where the hole is.

We've gotten rid of the hole, but now the relationship between parents and children is not maintained because 130 > 110. So, after we fill in the hole, we have to once again move up the tree swapping with the parent until the parent is smaller.

Now, we finally have a heap. To review, first we remove the value from the root of the heap, and then work our way down the tree replacing the removed value with the smaller of the two children. Next we move the last value in the heap to fill in the hole (if there is a hole). Finally, we swap up the tree like we did in the insert so that the tree we are left with still satisfies the properties of a heap. This however is not the best way to remove from a heap. The best way to remove is to record the value out of the root of the tree and then swap it with the bottom-rightmost value in the tree. Then remove the last element of the tree. This allows for less swapping and less complexity.

Implementing A Heap With A Vector

Up until now, we have been talking about heaps as trees, but how would we implement them? Using a tree to implement the heap leaves us with some questions that have difficult answers. How do the children know who their parent is? How do you keep track of where the next item should be added? In order for children to interact with their parent, we would have to use recursion to go down a path, and then as the recursive calls unfold swap back up the tree to restructure the heap. To keep track of where the next item should go, we could keep a count of the number of items in the heap, and use that number to construct the path down the tree to insert.

Both questions have reasonable answers, but in practice they require complicated code in order to work correctly.

So if we are not going to store the heap as a tree, how should we store it? Before we decide how to store it, let's number the spots in the heap as follows:

We have numbered the items in the heap from top to bottom and from left to right with no gaps in the numbering. These should resemble indexes for the spots in the Heap. The indexes themselves are not what makes this numbering system so valuable. If you look closely, you will notice the following two relationships: the left child's index is exactly twice the parent's index, and the right child's index is twice the parent's index + 1. This means that we can easily get from the parent to one of its children, and we can also easily get from the children to the parent by simply dividing by two. This relationship between the indexes is what makes it possible for us to implement our heap using a Vector.

When we numbered the items in the heap, we started with the root at 1 rather than 0. This simplifies our math considerably, but if we are using a Vector indexes start at 0. What should go in index 0 if the root is at index 1? The easiest solution is to just throw a "null" into index 0 and effectively throw that part of the Vector away. We could start the root at 0, but that complicates the math, and throwing away one index in a Vector is not really a significant waste of memory.

Heap Sort

So, let me propose this we use the vector or array implementation of a heap, as discussed above, to sort a list of unsorted numbers. As with our other sorts, we'll view things in terms of two lists, an unsorted list, and a sorted list. Initially, the unsorted list will be full and the sorted list will be empty. Also, as before, each iteration will remove one element from the unsorted list and add it to the right place within the unsorted list.

For the moment, just to keep the math easy, we'll use elements 1...n, leaving element 0 empty. This way, each parent's left child, if it exists, will be at index 2*p and its right child, if it exists, will be at index 2*p + 1. Similarly, each node's parent will be at index (int) (c / 2).

So, the inital phase is to build the heap. We do this just as described above. Each iteration, we take the item with the lowest index (left-most item) from the unsorted portion of the list and add it to the heap. We do this by swapping it with its parent, and its parent's parent, &c, as necessary, until the heap-order property is restored.

This phase is O(n*log n). The "n" portion of this derives from the fact that we must add each node, one at a time, to the heap. The "log n" portion represents the fact that, in the worst case, we might need to swap the newly inserted node with each node between the leaf level and the root of the tree. Since the tree is fully balanced, the maximum path length is "log n".

Now, we observe that the minimum value is at the top of the heap. So, let's perform a removeMin(). At this point, we have the second lowest value at the top of the heap, and an empty slot in the last position of the array. We use this empty slot to store the old minimum value.

We now view the situation as this. We have a heap in positions 1...n-1 of the array and a sorted list within positions n...n.

We repeat the process by removing the minimum element of the array using removeMin(), which again creates an empty slot. This time at the position n-2. We drop the value we just removed off in this slot. We now have a heap in positions 1...n-1 and a sorted list in positions n-1..n.

Each time we repeat this process, the heap will shrink by one item and the sorted list will grow by one item. So, to sort the whole list will take n iterations. Each iteration involves the removeMin() operation, which is, as you know, log n. So, this phase of things is O(n*log n).

If we add both phases together, we end up with O(n*log n). Remember, we throw away the coefficients. "2*n log n" is still O(n*log n)".

The only detail is that our list is, depending on what you were expecting, sorted backwards -- from highest to lowest. If, instead, you want the list sorted, as we are accustomed, from lowest to highest, this can be easily accomplished. We can just build a max-Heap instead of a min-Heap. The result will be the removal of the items in from greatest to smallest, resulting in a list from smallest to greatest.

Lets look at this removeMax() being done on a heap.

   an unsorted vector

   x 6 5 7 2 1 4 9 0 3 8 
   0 1 2 3 4 5 6 7 8 9 10
 
   we insert 6

   x 6                           we insert 5   X 6 5 
   0 1   it is now sorted                     0 1 2  it is still sorted

   we insert 7
   
   x 6 5 7   this is wrong 7 is greater than its parent (6) we swap 7 and 6

   
   x  7 5 6    we insert 2    X 7 5 6 2  we are ok

  we insert  1               we insert 4

   X 7 5 6 2 1               x 7 5 6 2 1 4    for is  smaller than its parent (6)
                             0 1 2 3 4 5 6    

   we insert 9
   x 7 5 6 2 1 4 9   9 is not smaller than its parent (6) 
    0 1 2 3 4 5 6 7  (parent at index: (int)7/2 = 3)  so we swap 6 and 9

   X 7 5 9 2 1 4 6  9 is still larger than its parent 7 so we swap them
   0 1 2 3 4 5 6 7 

   X 9 5 7 2 1 4 6 
   0 1 2 3 4 5 6 7 

   you see the picture.

   Now what if we wanted to remove?
   Since we can only remove the root we start there. 

   store 9 in the 0th position

   9   5 7 2 1 4 6   but now we have a hole and have to fill it in.
   0 1 2 3 4 5 6 7   So we pick the last leaf.


   9 6 5 7 2 1 4      but 6 is not larger than both its children so
   0 1 2 3 4 5 6 7    we swap 6 and 7

   9 7 5 6 2 1 4
   0 1 2 3 4 5 6 7  it is now sorted so we move 9 to the end

   x 7 5 6 2 1 4  |  9
   0 1 2 3 4 5 6  |  7

   now we remove 7

   7   5 6 2 1 4 
   0 1 2 3 4 5 6

   we move the last leaf 4

   7 4 5 6 2 1      4 is smaller than its leaves so we swap 4 and 6 
   0 1 2 3 4 5 6


   7 6 5 4 2 1 
   0 1 2 3 4 5 6   it is now in order so we move 7 to the end

   X 6 5 4 2 1 | 7 9 
   0 1 2 3 4 5   6 7


   You get the idea
  

Also, if you prefer to avoid wasting the first slot, the parent and child index formulae can be changed slightly to accomodate for this off by one situation.

Heap Sort Implementation

THIS IS FOR 0 AS FIRST INDEX HEAP and it is not at all what we did in class. I am hoping that there will be better implementation code in the next lecture but I am putting this here in case we dont have better implementation code.
class HeapSortExample extends SortExample
{
  public HeapSortExample(int how_many, int min_val, int max_val,
                         int sorted)
  {
    super (how_many, min_val, max_val, sorted);
  }


  private void swapUp(int new_val_index)
  {
    int parent_index = (new_val_index+1)/2-1;

    while (new_val_index > 0)
    {
      if (numbers[new_val_index] > numbers[parent_index])
      {
        swapNumbers (new_val_index, parent_index);
        new_val_index = parent_index;
        parent_index = (parent_index+1)/2-1;
      }
      else
        break;
    }
  }


  private void buildMaxHeap()
  {
    
    
    for (int insert_me=0; insert_me < numbers.length; insert_me++)
      swapUp (insert_me);
  }


  private void sortMaxHeap()
  {
    int hole;
    int temp;

    for (int remove=0, last_heap=numbers.length-1; remove < numbers.length;
         remove++)
    {
      temp = numbers[last_heap];
      numbers[last_heap--] = numbers[0];

      for (hole=0; hole <= last_heap; )
      {
        if ((2*(hole+1)-1 > last_heap) && (2*(hole+1) > last_heap))
        {
          numbers[hole] = temp;
          swapUp(hole);
          break;
        }

        if (2*(hole+1)-1 > last_heap)
        {
          numbers[hole] = numbers[2*(hole+1)];
          hole = 2*(hole+1);
          continue;
        }

        if (2*(hole+1) > last_heap)
        {
          numbers[hole] = numbers[2*(hole+1)-1];
          hole = 2*(hole+1)-1;
          continue;
        }

        if (numbers[2*(hole+1)-1] > numbers[2*(hole+1)])
        {
          numbers[hole] = numbers[2*(hole+1)-1];
          hole = 2*(hole+1)-1;
        }
        else
        {
          numbers[hole] = numbers[2*(hole+1)];
          hole = 2*(hole+1);
        }
      }
    }
  }