Queues and Stacks
Today we talked about a new data structure, the Queue. If you've ever been to the bank, a grocery store, or an amusement park, then you already know what a queue is. Its just a line. If ten people enter a queue, they each have to wait their turn to get to the front of the line. We want a data structure that captures this functionality.
We refer to Queues as a FIFO datastructure. FIFO stands for first in first out. We add objects onto the back of the queue, and then when we want to remove something, we remove the first thing that was added onto the queue. Just like a line at the bank. No one can cheat and cut in line, so every person (or object) has to wait its turn.
Why would we want such a data structure? In class, we discussed several uses. One of the most obvious uses is for simulation purposes. For example, in the supermarket, there are many different Queues. Some of these are specialized express queues, where people with 10 items or less can go to get out faster. For example, if Kesden just wants to buy a bottle of soda, he doesn't want to have to wait 45 minutes behind the crazy cat woman (or man) to finish ringing up her shopping cart full of assorted cat foods. Customers with small purchases will get very impatient if they have to wait a long time for their stuff, while customers with huge orders will generally be more tolerant of long waits. A grocery store also wants to consider that certain items yield higher profits than others. So they want to make sure that people buying high profit items like soda are kept happy, while they care less about customers who buy nothing but produce, which is not as profitable for the supermarket.
Clearly, the task of optimizing grocery store lines will get very complicated. However, the manager has access to all kinds of data, including what people buy and how long they had to wait in line. So in theory, he can write a computer program that will simulate a typical shopping day, and experiment with different numbers and types of checkout lines to try and optimize his profits. While its true that most grocery stores probably don't go to this much trouble, its easy to see how this type of simulation could be important.
Another major application of Queues involves the allocation of computer resources. For example, every time you print something at the cluster, your printing job gets added to a print Queue. In the spirit of fairness, these jobs get printed in the order that they were requested.
Queue Implementations
So how do we implement a Queue? One easy way to do it with a Linked List. It has a well defined front and back, and we can easily add and remove items to the ends of it. If we are using a Linked List, we have two options. We can define either the head or the tail to be the front of the list. So let's think about the big O costs of these operations.
What if we say that the tail is the front of the Queue. We will have two operations, enqueue, which adds to the back of the list, and dequeue, which removes from the front of the list. So the cost of enqueue will be the cost of adding to the back of the Queue, which is the head of the LinkedList. This is a simple constant time operation, or O(1). However, what about dequeue? To remove at the tail of a regular LinkedList is O(n)! We need to walk through the entire list each time, which is very expensive. Think about trying to empty a queue of n elements. This would be n dequeues, each of which is O(n)! That means that it would be O(n^2) just to remove everything from the queue!
So instead, we make the head the front of the Queue. Our enqueue now has to add to the tail, which is constant time O(1). And our dequeue now just has to remove at the head, which is also O(1), since we need only change the head reference. Sow now we can both add and remove in constant time. This is much better! Of course, if we use a doubly Linked List, either of these methods would be fine, since you can do constant time removals at the tail.
Then we talked about a completely different implementation of a Queue, this time using arrays. One naive implementation using arrays would be to simply declare the front of the array to be the front of the Queue, and then keep track of the index of the last item in the Queue. To enqueue, we would simply put the new object into the array at the correct spot and increment our index. This a nice, constant time operation. However, what do we do when we want to dequeue? We must remove the object at index 0. So to keep that spot as the front of the Queue, we would have to shift every object in the array over by 1. This, unfortunately, is an O(n) operation.
0 1 2 3 4 5 ------------------------- | A | B | C | | | | ------------------------- backOfQueue = 3
First, lets enqueue("D").
0 1 2 3 4 5 ------------------------- | A | B | C | D | | | ------------------------- backOfQueue = 4
Now lets dequeue().
0 1 2 3 4 5 ------------------------- | | B | C | D | | | -------------------------
Now we need to shift everything over 1.
0 1 2 3 4 5 ------------------------- | B | C | D | | | | ------------------------- backOfQueue = 3
One solution to this problem is to just have a second variable that keeps track of the front of the queue as well. So now, when we dequeue, we just increment the front. Now our dequeue will work like this...
0 1 2 3 4 5 ------------------------- | A | B | C | D | | | ------------------------- backOfQueue = 4 frontOfQueue = 0
Now, we dequeue().
0 1 2 3 4 5 ------------------------- | | B | C | D | | | ------------------------- backOfQueue = 4 frontOfQueue = 1
We can still easily get the size of our list by just subtracting the front from the back. And now both enquee and dequee are O(1) operations. However, this implementation is still flawed. Every time we dequeue, we create wasted space at the front of our array that can never be used again.
To fix this, we can have our buffer wrap around back to the beginning of the array. Think of the array as a circle. So if our 6 element array has its 5th position occupied, but the 0th position is free, we just enqueue into the 0th position and update our index variables accordingly.
This type of implementation is actually really easy to do using the modulus operation, '%'. Using this, we dont have to even worry about a special case when you do your wrap around. Every time you increment the front and back, use the modulus operator to divide by your array length and get the remainder. For example,
backOfQueue= backOfQueue+1; backOfQueue= backOfQueue % maxSize
So now, saying something is at position 6 in a six element array is the same as saying it as position 0. Position 8 is the same as position 2, etc... Now, we have constant time add and removes, and we no longer waste space.
Stacks
Now lets talk about another data structure that we see in everyday life, the Stack. Think about a stack of paper. We put the first piece of paper right on the table, and the next piece of paper on top of that one, and so on. If we want to read a paper, we'll take the first one off the top.
This type of data structure is called LIFO, or last in first out. When we remove items from a Stack, we remove the item that was most recently added, which is very different from a FIFO Queue.
When talking about stacks, we use special terms. When we add something to a stack, we say we "push" that object onto the stack. When we remove something, we say we "pop" it. Kesden used the analogy of storing cafeteria trays to explain this. The tray storage area is spring loaded, so that the top tray is always at the same height. When we put a tray back, we "push" the spring down. When we remove a tray, the compressed spring "pops" up.
Implementing a stack is actually much simpler than implementing a queue. Like with Queues, we can use either arrays or linked lists.
If we use a linked list, we can just always insert and remove at the head. These are both constant time operations, and it follows the specification of a Stack. The head can be thought of as the top of the stack.
Arrays are also great for implementing stacks. We would do it in much the same way as the queue, except that the bottom of the stack is always position 0. We only need to keep track of the top of the stack. When we push an object, we put it in at that index and then increment the top. When we pop an object, we remove the object at the top position, and decrement top. We no longer have to worry about wasting space like we did with Queues.
0 1 2 3 4 5 ------------------------- | A | B | C | | | | ------------------------- topOfStack = 3
We push("d").
0 1 2 3 4 5 ------------------------- | A | B | C | D | | | ------------------------- topOfStack = 4
Then we pop() an element off the stack.
0 1 2 3 4 5 ------------------------- | A | B | C | | | | ------------------------- topOfStack = 3
Simple. We never have to shift anything or wrap around.
So what good are stacks? One major use of stacks in computers involves making method calls. Whenever you have methods that call other methods, Java needs to keep track of where in the code to return to after each method call. Stacks are perfect for this, but we'll talk about that more next time.
Another use of stacks involves order of operations in mathemeatical expressions. Given the expression...
((4+5)/3)
We can clearly see that the answer is 3. However, for a computer this is a little trickier. The first number the computer sees is 4. Next it sees a '+' sign, so it knows it is going to add something to 4. But what? The computer sees 5, but how does it know that the 5 isnt immediately followed by an operator such as '*' or '/', which takes precedence over addition?
To clear up the ambiguity, lets rewrite the expression using a different notation. When we write 4+5, we call this infix notation. This means that the operator '+' comes in between the operands '4' and '5'. We'll introduce a different notation, called postfix notation. In postfix notation, that expression is written as 4,5,+. Where the operator comes after both operands. So the above expression in postfix notation is...
4,5,+,3,/
Now, we can use a stack to solve this expression! So here's our strategy. We'll walk through the list of operators and operands in postfix notation. Whenever we see an operand, like '4' or '5', we'll push it onto a stack. So after reading the first two elements, our stack looks like this.
----------------- | 4 | 5 | | | -----------------
Now we see a '+', which is a binary operator. When we see a binary operator, we pop off two elements and add them together. Be careful though. Since the stack is LIFO, the first thing we pop will be on the right hand side of the operator. So if we pop '5', then '4', this corresponds to 4+5, NOT 5+4. Of course, for addition it doesnt matter, but for subtracting and dividing this is very important. So now, we do the operation and get '9'. We then push this result back on the stack and continue walking through our list. After we push on the '3', we have this...
----------------- | 9 | 3 | | | -----------------
Now, we read a division operator, '/'. So we pop off '9', then '3', and do the appropriate division, 9/3, and get a result of 3. We then push this back on the stack.
----------------- | 3 | | | | -----------------
Finally, we see that we have finished the expression. We can now just pop the last element off the stack and it will be our answer. Practice putting other expressions into postfix notation and then doing this process to convince yourself that it works.