Homework # 8 - Key Database Design & Administration Due: 4/17/2001 |
70-455 Information Resources Management 4/10/2001 50 points |
You are the database administrator for the Carnegie Library System. The database contains the following information.
Double underlining indicates a primary key that is also a foreign key.
Branch(BranchID, Address, City)
Book(Call#, Seq#, BranchID, PatronID, DueDate)
BookTitle(Call#, Title)
TitleAuthor(Call#, AuthorName)
TitleSubject(Call#, Subject)
Patron(PatronID, Name, Address, BranchID, FinesDue)
Seq# is a unique number for each copy of the same book.
BranchID in Patron refers to the branch where a person got his/her library card.
The row totals are as follows:
Table |
# Rows |
Branch |
19 |
Book |
125,000,000 |
BookTitle |
12,500,000 |
TitleAuthor |
17,500,000 |
TitleSubject |
50,000,000 |
Patron |
350,000 |
Currently, the database is centralized and standard disk storage is being used.
The Director of the Library System has come to you because they are experiencing performance problems. Patrons are calling for the return of the card catalog system saying that it would be faster than the computer system that they use to locate their books.
1. Create a denormalized schema that would improve the performance of the "typical" patron queries: search for a book in the current branch by title, subject, or author with call number returned.
Branch(BranchID, Address, City)
Book(Call#, Seq#, BranchID, PatronID, DueDate)
SearchBook(Call#, Title, AuthorName, Subject)
Patron(PatronID, Name, Address, BranchID, FinesDue)
2. Five "warnings" about denormalization were provided in class. Apply these to the denormalized schema you developed for #1. Assume you were drafting a memo to the Director of the Library System. Use the warnings to explain the cost and impact of denormalization to the director.
The "SearchBook" table is not 4th NF.
Increases chance of errors or inconsistencies - a subject change for a book will require multiple updates, which will cause problems if not done correctly.
May result in reprogramming if business rules change - if subjects are changed so that there are main subjects with subtopics for each main subject, much more reprogramming will have to be done than if the original schema was used.
Optimizes based on current transaction mix - the denormalized schema is designed to improve the performance of the patron search. If the mix of transactions changes, this new design could be a hinderance.
Increases duplication and space required - For any title, the number of rows in SearchBook is the number of authors times the number of subjects.
Increases programming complexity - updating subjects or authors is more complex.
3. Assume that the database must remain centralized (stored on a server at the main branch in Oakland). Show how you would use vertical and/or horizontal partitioning of the original database schema to improve patron query performance. What are the costs or impacts of your partitioning plan?
Horizontal partitioning makes the most sense. Partition the Book table by branch and create a separate Book table for each branch. I would also partition the BookTitle, TitleAuthor, and TitleSubject tables by branch, creating separate tables for each branch that only contained the titles, authors, and subjects for books in each branch. I would not partition the Branch or Patron tables.
Costs/Impacts:
Increases programming complexity
Increases duplication and space required
Increases chance of errors or inconsistencies
Reprogramming required if business rules changed - adding/removing a branch
4. The Director has read a few books on SQL and written the following query. She wants a count of the number of over-due books on the subjects related to procrastination checked out by people who have more than $5 in fines. Currently, she manually counts the books by person that are returned. She complains to you that it "never seems to finish." You also discover that the last time she ran the query was when there were complaints from almost every branch about the performance of the database.
SELECT *
FROM Branch as Br, Book as B, BookTitle as BT, TitleAuthor as TA,
TitleSubject as TS, Patron as P
WHERE Br.BranchID = B.BranchID AND B.Call# = BT.Call# AND
BT.Call# = TA.Call# AND BT.Call# = TS.Call# AND
B.PatronID = P.PatronID AND (B.Call#, Seq#) IN
(SELECT Call#, Seq#
FROM Book
Where DueDate < Today) AND
B.Call# IN
(SELECT Call#
FROM TitleSubject
WHERE Subject LIKE "%PROCRAST%")
AND
P.PatronID IN
(SELECT PatronID
FROM Patron
WHERE FinesDue > 5.00)
ORDER BY P.NAME
4. Explain why this query suffers from performance problems.
Return less data rather than more - too much returned
Single query better than sub-queries - sub-queries not needed
Reduce number of tables joined - unused tables included
Reduce sorting - order by unnecessary
5. Assume that the DBMS being used does not enforce concurrency control. One patron had unpaid fines in the amount of $10 when the following three transactions related to this patron were processed at the same time:
(T1) Payment of $5 received in the mail
(T2) Payment of $3.50 made in person at the library
(T3) A book thought to still be checked out was found on the shelves and the patron was credited for a $6.00 fine.
Each of these transactions read the patron record when the total fines due was $10 (before any were committed). The updated patron record was returned to the database in the order shown above.
5A. What will be the fines due for the patron after all three transactions have been executed?
FinesDue = $4
5B. What should be the fines due for the patron after all three transactions have been executed?
FinesDue = -4.50 (credit to patron)
5C. Show the timing and lock usage (LOCK-S, LOCK-X, UPGRADE, RELEASE) for the three transactions if the "most used locking scheme" is used. (This can be a simple table with a numbers down the left for times and a column for each transaction -- for each transaction show when it will impact a lock; use "wait" if a transaction is waiting for another transaction to release a needed lock.)
If all three transactions start at the same time, this will result in a deadlock.
Time |
T1 |
T2 |
T3 |
1 |
Lock-S |
||
2 |
Read |
||
3 |
Lock-S |
||
4 |
Read |
||
5 |
Lock-S |
||
6 |
Read |
||
7 |
Fines = Fines - 5 |
Fines = Fines - 3.5 |
Fines = Fines - 6 |
8 |
Upgrade |
||
9 |
wait |
Upgrade |
|
10 |
wait |
wait |
Upgrade |
11 |
wait |
wait |
wait |
T1 can't upgrade because T2 and T3 have shared locks so it waits. T2 gets control, but can't upgrade because T1 and T3 have shared locks so it waits. Same with T3. In the end, all are waiting for the others to finish first.
In order for these transactions to complete successfully, you must assume that T2 doesn't start until after T1 has upgraded its lock and that T3 doesn't start until after T2 has upgraded its lock. With those assumptions, the execution is as follows.
Time |
T1 |
T2 |
T3 |
0 |
T1 Starts |
||
1 |
Lock-S |
||
2 |
Read |
||
3 |
Fines = Fines - 5 |
||
4 |
Upgrade |
||
5 |
T2 Starts |
||
6 |
Lock-S |
||
7 |
wait |
||
8 |
Write |
wait |
|
9 |
Commit |
wait |
|
10 |
Read |
||
11 |
Fines = Fines -3.5 |
||
12 |
Upgrade |
||
13 |
T3 Starts |
||
14 |
Lock-S |
||
15 |
wait |
||
16 |
Write |
wait |
|
17 |
Commit |
wait |
|
18 |
Read |
||
19 |
Fines = Fines - 6 |
||
20 |
Upgrade |
||
21 |
Write |
||
22 |
Commit |