Zip file with source files, documentation, &c.
Browsable version of the files above
Tests:
You guys get to write your own tests this time. Good luck, and be sure to make them hard!
This lab will give you a chance to explore applications of
For this part, you want to count the number of times that a sequence X appears as a subsequence of a sequence Y.
Before coding, you should have a good idea what you will be writing. The code for this problem is extremely short, but only if you know what to write. Work out on paper what your DP solution looks like. Submit a paper version to the TA in class or include a an electronic version with your final solution by the due date.
Write methods
public static int countSubsequences(byte[] x, byte[] y) public static Set<List<Integer>> getAllSubsequences(byte[] x, byte[] y)
In this problem we will measure the similarity of two strings by their edit distance, a concept that is widely used in spell checking, speech recognition, plagiarism detection, file revisioning, and computational linguistics. The edit distance measures the number of editing operations it would be necessary to perform to transform the first string into the second. The possible operations are as follows:
Here is an example of transformation from activate to caveat:
D M R D M I M M D a c t i v a t e c a v e a t
The edit distance is the minimum number of edit operations, excluding matches. It is 5 in this example.
Write the method:
that accepts two strings and returns the minimal edit cost. The empty string is considered valid input. If you are careful, you don't have to handle this as a special case. Place this method in the class EditDistance.
The above procedure describes how to compute the edit distance between two strings. We need to recover the optimal alignment itself. The key idea is to retrace the steps of the dynamic programming algorithm backwards. You will write the method:
that accepts two strings and
returns the actual transformation. The
output is a string of characters
D
, I
, R
and
M
that transforms the first
string into the second. Place this
method inside the EditDistance class
as well.
Sometimes it is possible to have multiple transformations of minimal length. Consider the transformation from abba to aba:
M D M M M M D M a b b a a b b a a b a a b a
In this case, you can return any valid transformation.
Write a method
that accepts two strings and returns a Set of all possible transformations, as described in problem 2.2. Place this method inside the EditDistance class as well.
It should go without saying that your submission must have good style. This is needed so that the TAs can understand your code and give you partial credit. It's also a good habit.
As always, start early. This lab requires relatively little code, yet is somewhat conceptually difficult for beginning dynamic programmers. Office hours will be most valuable for you if you come with a worksheet in hand. For problem 1, this would be the handout for part A, for the other problems, it would be a subset of this page. This lab is best done with some thought prior to coding as to what the appropriate helper functions, variables should be.
Your assignment will be graded first by compiling and testing it. After that, we will read your code to determine whether appropriate data structures and algorithms were used, as well as judge the overall style of your code. Programs will be graded both on correctness as well as their ability to deal with so-called edge cases, like empty lists, lists with only one element, and so on. Points will be deducted for poor design decisions and un-commented, or un-readable code as determined by your TA.