Introduction: What is a transaction?
Today we begin discussion of another useful abstraction, namely that of a transaction. A transaction, traditionally associated with databases, is a single unit of work. Transactions are an all or nothing proposition. That is, when we run a transaction it must either successfully run to completion or else it must not change any state, leaving the system as if it had never been run. Furthermore, transactions guarantee the ACID (Atomicity, Consistency, Isolation, Durability) properties.Atomicity and isolation mean that all of the changes caused by the transaction occur "at the same time". That is, no other process is ever permitted to see an intermediate state of the database while things are being changed. Consistency requires that a transaction leave the database in a "good" state. Durability requires that any change to the database that has been reported to the user as having succeeded must have a lasting effecting on the database and cannot subsequently be lost.
An example: Banking
To see how this abstraction can be useful, we consider an example from banking. Consider the following database of bank accounts:
Account # Owner Type Balance 15 Jie Savings $1,000,000 32 Jie Checking $5 36 Alex Savings $42 74 Alex Checking $10,000 45 Erin Savings $5,000 16 Erin Checking $315 27 Victor Savings $0.10 88 Victor Checking $27,000,000,000 Suppose, for a moment, that we want to transfer money from one account to another or withdraw money from an ATM. We need to be very careful to get this right, otherwise someone will lose money and be very unhappy. If, for instance, Erin withdraws $15 from her checking account then the transaction had better update her account value to $300 and the change better be durable or the bank will lose money. If Alex attempts to transfer $1,000 from her savings to her checking, then the transaction had better have no effect. If we add $1,000 to her checking acount before realizing there are insufficient funds in her savings account, then we must undo this or the bank will be out a grand. Thus we see that we want the interactions to be all or nothing. We don't want to allow the state of our database to be changed by a failed transaction. Finally, we need our interactions with the system to be atomic. If Victor walks up to two ATMs and tries to withdraw $27b from both of them at the same time, we don't want them to simultaneously check the account balance, confirm that he has sufficient funds, dispense $27b each, and write $0 back as the balance for his checking account. If we allowed this, then the bank would be $27b and almost certainly bankrupt. Thus, we need a guarantee of atomicity. If we could draw a magic box around some code, call it a transaction and get the ACID properties described above, then this would provide the properties we need to have a functional banking system.
Implementing Transactions
In order to implement transactions efficiently, we run the code within the transaction and log all input and output without letting it update the actual database. Thus, we keep a record of all the writes that it attempts and all of the reads it performs without needing to change any values in the database. Because we can do this without changing the contents of the database, performing an ABORT in the event that the transaction fails to complete is a trivial operation. If an ABORT occurs, we can simply delete our log and the the database is left unchanged. This design is what permits us the nothing portion of 'all or nothing'.When a transaction completes successfully, we perform an operation called "commit" which causes all of the changes to be written out to the database and become visible to reads from other transactions as one atomic operation. We can do this using locking like we talked about earlier in the course. We need to lock every record in the database that we are writing to ensure that all our changes show up atomicly. Furthermore, we need to lock every record that we read and verify the value has not changed since we read it. If we fail to do this then we will run into the problem discussed above where Victor is able to defraud the bank. So, the commit operations consists of locking (in some consistent order) every record that we read in or wrote out as part of the transaction, rereading all of the records and confirming that they have not changed since we last wrote them, and writing out the new values for any records that the transaction has changed.
What happens if, as we're committing a transaction, we discover that one of the records read in by the transaction has changed? In this case, notice that it's not too late to ABORT the transaction. We can still throw away our logs and give up on the transaction as nothing in the database has been changed yet. Note, though, that we don't want to report an ABORT back to the user as there is no reason why the transaction should have failed. We were simply unlucky as some other transaction happened to mutate the state that we were looking at. We can restart the transaction, running it again from the beginning in hopes that we get luckier this time around. We can continue this process of attempting to run the transaction and aborting it due to changes in read records until we get lucky and make it through the transaction with no other transaction getting in the way. Occasionally, however, it may seem that we have little hope of doing this as time and again something changes on us. In such an event we can lock everything we need (or possibly the entire database) and run the transaction to completion. This will prevent any other transaction from getting in our way, but will also reduce the concurrency of the system as we will be locking some or all of it for the duration of the entire transaction rather than just the commit phase.