15-295 Spring '05 Meeting 2 (Wednesday, January 19, 2005)

Return to the Index of Meeting Notes and Topics

15-295 Meeting 2 (Wednesday, January 19, 2005)

Readings and Problems

Problems:
- Selected Problems from University de Valladolid archive:
- Problem Set (pdf): 1 2 3 4

Announcements

The course Web space is now in its permanant home: www.andrew.cmu.edu/course/15-295. Similarly the course AFS space is also in its permanant home: /afs/andrew/course/15/295.
You should have recevied email from us, maybe two, via the course mailing list. If you didn't, please let us know so we can fix it.
Please rememeber to email both Eugene and Greg with questions or concerns. It'll get you the fastest answer and it'll also help us to stay coordinated.
Code is graded based on functionality, not programming style, "big picture" correctness, etc.
In order to be more flexible and to encourage students to complete all homework assignments, we've relaxed the late submission policy (just a bit). You may submit late work for up to seven days after the due date.
As advertised in the syllabus, late work continues to receive a 50% penalty. This is because we discuss the problems during class. But, we hope that this discussion will help you to complete any problems that you were unable to complete on time.
Please note that should you submit certain homework problems on time and other late, only the late problems will receive the penalty. So, please submit those problems that you have completed before the deadline, even if you will be submitting other problems late.
It is perfectly okay to submit some problems before the deadline and some after. Problems submitted on-time will not be penalized for lateness.
If you cannot solve problems on your own, we encourage you to solve them after participating in the class discussion.
Size students aren't yet officially registered (or known to be auditing). Please register or let us know that you are just "sitting in".

Problem D

A "map" includes a number of segments, specified by the planar coordinates of their endpoints. We should determine the minimal number of segments required for drawing this map.

Consider a simplified problem: All segments are on the x-axis:

In the simplified problem, as shown above, we can sort the segments in the increasing order of their left endpoints and then process them from left-to-right. We denote the array of left points as left[0...n-1] and the array of right endpoints as right[0...n-1]. The algorithm for solving the problem is as follows:
// Precondition: left[0...n-1] is sorted
int simpleSegments (int left[], int right[], int n) {

  int counter = 1;
  int r = right[0];

  for (int i=2; i < n; i++) {
    if (left[i] > r) 
      conter++;
    if (right[i] > r) 
      r = right[i];
  }

  return counter;
}
Returning to the original problem with two dimensions. If segments in two dimensions all belong to the same straight line, we can use the same problem. It is really a 1D problem, with a different baseline orientation.

So, the big picture solution to the 2D problem is to divide the segments into groups such that segments in the same group share the same line. In other words, create one group for each base line and include in each group all segments oriented along that line. Then, for each group, determine the minimal number of segments.
For each group we determine the values a and b in the line equation, y = a*x + b. If two segmetns have the same a and b, they are along the same line and therefore in the same group.
If x₁ is not equal to x₂, then the line equation is more complex:

y = x* (y₂-y₁)/(x₂-x₁) + (y₁*x₂-x₁*y₂)/ (x₂-x₁)

In the case above,

a = (y₂-y₁)/(x₂>-x₁)
b = (y₁*x₂-x₁*y₂)/ (x₂-x₁)

So, we have the overal algorithm:

For each segment, determine a and b

Divide the segments into groups such that segments are in the same group if they have the same a and b.

For each group use the simpleSegments algorithm discused earlier for the 1D problme to determine the minimal number of segments

Add up the resulting minimum number of segments

Implementation issues:

You should use a hash table or BST for indexing (a,b). In C++ the map or hashmap are good choiuces. In Java TreeMap and HashMap are good choices

To avoid minimize rounding errors, always sue double not float. Or, better yet, don't do the division. Just keep track of the numbertor and denominator and use cross multiplication:

a/b == c/d, iff ad == cb

You should treat vertical lines as a special case, since the equation y = a*x + b doesn't give us a and b for these lines.

Problem H

We consider a list of word pairs in some unknown language and a list of their translations. The order of translations is not the same as the order of original word pairs. We need to determine the right translation of each word.
The right idea is to create the bipartite graphs of possible word matches and possible pair matches and then to prune these graphs until there is only one possible match.
For example, we can use the following greedy-search algorithm:
Initialize the search

Count the number of occurences of each word as the "first word"
Count the number of occurences of each word as the "second word"
Create a bipartite "word" graph consistent with the word counts.
Ensure the "internal consistency" of the word graph by removing inconsistent edges.
Create a bipartite "pairs" graph consistent with the "word" graph.

Prune the graph

Repeat the following until no edges are removed in a pass:

Ensure the internal consistency of the "pairs" graph by removing inconsistent edges
Ensure the "words" graph is consistent with the "pairs" graph by removing inconsistent edges
Ensure the internal consistency of the "words" graph by removing inconsistent edges
Ensure the "pairs" graph is consistent with the "words" graph by removing inconsistent edges

The word counts are consistent if the the word on the left side of an edge and the word on the right side of an edge are used the same number of times as first words and they are also used the same number of times as second words. In other words, for every edge (u,v) the counts for u are the same as the counts for v.
  1 0    *----------*     1 0     (good)
  1 0    *----------*     2 1     (bad)
A graph is internally consistent if for every edge (u, v), if u has only one outgoing edge, the v has only one outgoing edge.
   *----------*     (good)

   *----------*     (bad)
   *_____----/
  
   
The "pair" graph is consistent with the "word" graph iff, for every edge [U₁, U₂], [V₁, V₂), there are edges (U₁,V₁) and (U₂,V₂).
The "word" graph is consistent with the "pair" graph iff the following are true for every (U,V) in the word graph:

Every vertex [U*] in the pairs graph has an edge ([U*][V*])
Every vertex [V*] in the pairs graph has an edge ([U*][V*])
Every vertex [*U] in the pairs graph has an edge ([*U][*V])
Every vertex [*V] in the pairs graph has an edge ([*U][*V])

If the Greedy search produces a one-to-one correspondance we are quite lucky -- this isn't a guarantee. If not, we need to use a recursive search.

Initialize
Recursive search

Prune
If some vertex in the word graph has no edges, return
If every vertex has exactly one edge, print out the correspondance and return -- we win.
Otherwise, find the vertex with the minimum number of edges and, for each edge:

Create copies of the two graphs
Remove this edge from the graph
Repeat this recursive search upon the copies

Implementation Strategy

Implement the greedy search and submit your solution. Espectially in our region, this tends to be enough. They often look for good solutions that aren't exhaustive or guaranteed. If this fails, then implement the exhaustive search (ouch!)

Problem E

We use several dice, where each die has a certain number of facets with return values. We need to add a new die, with a given number of values, that ensures chances of getting certain sums.
For example, consider the following dice: 1,1,1,2 or 1,2,2,2 or 1,1,1,2,2,2.

Initial Chances

Value 1 2 3 4 5 6

Chances 0 0 9 39 39 9

Now, add a four-sided die to ensure that the chances of getting a 4 is a 5 and the chance of getting a 9 is a 48. It is important to note that the chances of getting other sums do not matter. The solution is a die: 1,2,3,3.
This is a messy problem with no neat solution -- at least we don't have one. The basic approach is to compute the initial chances and to search through all possible labels of the new die.
To make this problem manageable, we use dynamic programming. We use recursion and use a table to prevent computing partial solutions that overlap among one or more paths.
It is important to note that the maximum number of dice is 20 and the maximum value is 50. As a result, the required table can be no more than 20 * (50*20) = 20,000 entries.
Consider the example:

Die Possible values

1 3 1 0 0 0 0

2 0 3 6 3 0 0

3 0 0 0 39 39 0

The algorithm for initializing procedes as follows:
int chances[0...20, 0...10000] = { {0, ..., 0}, {0, ..., 0} }

chances[0,0] = 1;

int i = 1;
for each dice do {
  input the next die, with f facets and values sides[0,..., f-1]

  for (k=0; k < f; k++) {
    for (j=f[k]+1; k < 1000; k++) {
      chances[i,j] = chances[i,j] + chances[i-1, j-f[k]]
    }
  }
}
From there, we try all possible lables and prune unacceptable combinations. In the worst case, search through 50⁶* 10¹⁰ combinations. It is very slow in theory, but smart pruning can make it work in practice.
for (each possible value of face1) do {
  if (no inconcsistency) {
    for (each possible value of face2) do {
      for (each possible value of face3) do {
  
        . . . 

      } 
    } 
  } 
}
An inconsistency occurss if the chance for a given sum is strictly greater than the given chance. For example:

Initial Chances

1 2 3 4 5 6

0 0 9 39 39 9

Required Chances

7 8 9

9 1 1

In the above example, the possible values for face1 are:

1- OK
2 - Chances for 8 are too great, prune
2 - Chances for 8 are too great, prune
...

When setting some value for a facet, make the respective update in the table of chances. If some chance value becomes too large, then consistency is violated and we can skip the inner loops.
There are a few ways to optimize this process:

When updating the table of chances for the new dice, consider only the sum (columns) of interest.

Consider only useful values of faces. A value is useful if it is no larger than the "maximium sum of interest" - the "minimum sum of initial chances". In other words, don't consider it if it is to large to be useful.

The implementtion can use recursion in place of nested loops, but this will run a bit more slowly. In some cases, it is more difficult to represent the solution using nested loops -- i whcih case the recursion might be more appropriate.