16-889: Learning for 3D Vision (SP22) - Assignment 5

three late-days

===================================

0. Running the code

Please refer to README.md for how to run the code for outputing all results.

Q1. Classification Model (40 points)

chair chair2 chair

chair2 chair

    - Lamps

chair2

    - Vase

chair chair2

- Failure cases for each class:
    - Chair: The ground-truth for this example is chair, while my model's prediction is lamp. This sample is not like a normal chair
    and is out-of-distribution. Therefore the model confuse it with a lamp. 

chair

    - Lamp: The ground-truth for this example is lamp while the prediction is vase. This sample is also not like a normal lamp and shaped more like a 
    vase, and is probably the reason why the model is confused.  

chair

    - Vase:  The ground-truth for this example is vase while the prediction is lamp. The model probably makes a mistake by thinking the vase as the base of the lamp, 
    and recognize the sticks as the neck of the lamp, therefore making the wrong prediction.   

chair

Q2. Segmentation Model (40 points)

chair chair chair chair

chair chair chair chair

- Failure cases:

    - Accuracy: 0.4850.  For this example, the model makes wrong prediction in the bottom (blue) part of the sofa.
        It is reasonable because unlike chairs where the legs can be easily recognized, this example of sofa do not have legs but have a base which basically is the extension of the cusion (red) part. Therefore
        the model confuses and predicts the base as cusion.
    

chair

    - Accuracy: 0.5092.  For this example, the model predict the legs as the cusion (output red rather than blue),
        and recognize back as the cusion (output red rather than cyan). The legs are reasonable as most chairs have four legs, and the model tries to predict four legs (four small portion of the blue) in the outputs.
        The back kind of looks like a pillow. Therefore it is hard to tell this pillow should belong to the back or the cusion. Therefore this prediction makes sense as well.

chair

Q3. Robustness Analysis (20 points)

Setting

I conduct 3 experiments to analyze the robustness of my model. They are scale, offset, and num_points respectively. For scale, I scale the points by some scalar value. And for offset, I apply translation to the points by some scalar. For num_points, I use less number of points for prediction. I report the best accuracy of my best model below in two sections for cls and seg separatly, and also provides some visualization of the prediction. The default parameters are scale=1.0, offset=0.0, num_points=10000.

Classification

0.1 0.5 1.0 2. 5.
0.245 0.332 0.980 0.737 0.647

We can see that compared to 0.980, the model has performance drops with even a small perturbation of the scale. And below are the visualization of the augmented inputs:

chair chair chair chair chair

-0.5 -0.1 0.0 0.1 0.5
0.356 0.946 0.980 0.906 0.233

We can see that compared to 0.980, the model also has performance drop by offsetting the pointclouds. And below are the visualization of the augmented inputs:

chair chair chair chair chair

50 500 1000 5000 10000
0.630 0.969 0.972 0.977 0.980

We can see that the model is robust to the change of the input points as long as the samples are drawn uniformly accross all points. We have significantly drop in the performance until we only have 50 points. And below are the visualization of the augmented inputs:

chair chair chair chair chair

Segmentation

0.1 0.5 1.0 2. 5.
0.619 0.792 0.898 0.753 0.727

We can see that the model is also sensitive to the scale, however it is more robust compared to the classification model. I think the model learns the relationships between points as well in the dense prediction. Below are the visualization of the outputs:

chair chair chair chair chair

-0.5 -0.1 0.0 0.1 0.5
0.528 0.859 0.898 0.866 0.582

We can see that the model is also sensitive to the translation. But again it is more robust compared to the classification model for the similar reason. Below are the visualization of the augmented inputs:

chair chair chair chair chair

50 500 1000 5000 10000
0.787 0.885 0.885 0.898 0.898

We can see that the model is also robust against the change of the number of points. And below are the visualization of the augmented inputs:

chair chair chair chair chair

Q4. Bonus Question - Locality (20 points)

chair chair chair