16-889 Assignment 5

Q1. Classification Model (40 points)

(1) The test accuracy of my best model: 0.9780

(2) Visualize a few random test point clouds and mention the predicted classes for each (here I visualize all correct samples):

Predicted class: chair

Predicted class: vase

Predicted class: lamp

(2) Visualize at least 1 failure prediction for each class (chair, vase and lamp), and provide interpretation in a few sentences.

GT label: chair

Prediction: lamp

Interpretation: We can see this chair has only two legs and it is difficult to be seen as a chair even for human. It may looks like some lamp data in the training set.

GT label: vase

Prediction: chair

Interpretation: We can see this vase has two big legs and it looks like a chair. And there is not plants in the vase that can help model understand what it is. So the model fails in this sample.

GT label: lamp

Prediction: vase

Interpretation: We can see this lamp looks like a vase. Its top part looks like plants and the bottom looks like supporter. These elements may make the model see it as a vase.

Q2. Segmentation Model (40 points)

(1) The test accuracy of my best model: 0.9029

(2) Visualize segmentation results of at least 5 objects (left: ground truth; right: prediction).

Good examples:

a. prediction accuracy: 0.9850

We can see this chair is easy to predict since its shape is simple.

b. prediction accuracy: 0.9857

Similar to above one, this chair's shape is clear and has no misleading parts.

c. prediction accuracy: 0.9905

Similar to above one, this chair's shape is clear and has no misleading parts. It is very easy to classify different parts.

Bad examples:

a. prediction accuracy: 0.4817

This sample's parts have no obvious borders and it is a sofa who structure is different from common chairs. It does not have 'legs'. These may be the reason that the model cannot work well.

b. prediction accuracy: 0.4590

This sample's parts also have no obvious borders (like the blue and yellow part in the back) and also has no 'legs' that is different from common chairs. This is a pretty special case of chair and the pillow even increases the difficulty. These may be the reason that the model cannot work well.

c. prediction accuracy: 0.5462

This sample's parts also have no obvious borders and also has no 'legs' that is different from common chairs. We can see the prediction is similar the second sample and seem also good. Actually this sample is very difficult to classify even for our humans. It's structure is very misleading. These may be the reason that the model cannot work well.

Q3. Robustness Analysis (20 points)

  1. I rotate the input point clouds by 30, 45, 60 degrees along z, x and y axis. The test accuracy is as below:

(1) task cls:

Original (no rotation): 0.9780

Rotate 30 degrees: 0.5037

Rotate 45 degrees: 0.5855

Rotate 60 degrees: 0.3578

(2) task seg:

Original (no rotation): 0.9029

Rotate 30 degrees: 0.4711

Rotate 60 degrees: 0.3426

Rotate 90 degrees: 0.2484

One fail example is as below (left: ground truth; right: prediction):

We can see that rotation can decrease the accuracy greatly which means the model is sensitive to rotation.

  1. I input a different number of points (set 1000, 500, 100, 50, 10) per object (modify --num_points when evaluating models in eval_cls.py and eval_seg.py). The test accuracy is as below:

(1) task cls:

Original (10000 points): 0.9780

1000 points: 0.9717

500 points: 0.9675

100 points: 0.9255

50 points: 0.8195

10 points: 0.2918

(2) task seg:

Original (10000 points): 0.9029

1000 points: 0.8911

500 points: 0.8761

100 points: 0.8099

50 points: 0.7753

10 points: 0.6789

We can see that reduce number of points can also decrease accuracy. Reduce small number does not affect much but reduce large number of points will lead to poor performance.

Q4. Bonus Question - Locality (20 points)

I use the network of pointnet++ (I select Multi-scale grouping (MSG) version as in the paper) and I borrow some functions from the pytorch implement repo.

Task cls

Test accuracy of Q1: 0.9780

Test accuracy of pointnet++: 0.9790

GT label: vase

Prediction: vase (in Q1, this sample is predicted as chair wrongly)

Task seg

Test accuracy of Q2: 0.9029

Test accuracy of pointnet++: 0.9189

We can see that pointnet++ has better performance since it has locality information.