Assignment 5

Late Days: 3

alt

Q1. Classification Model

Training:

python train.py --task cls

Evaluation (with visualization):

python eval_cls.py --load_checkpoint best_model --visualize

The model checkpoints will be stored in ./checkpoints/cls and visualizations in ./output/cls.

Test accuracy of best model: 96.85 %

Correct predictions

Input Point Cloud Correct Prediction
alt Lamp
alt Lamp
alt Vase
alt Vase
alt Chair
alt Chair

Incorrect Predictions

Interestingly, all the chairs in the test set were correctly classified by the model. The only misclassified examples were from the lamp and vase categories.

Input Point Cloud Ground Truth Prediction
alt Lamp Vase
alt Lamp Vase
alt Vase Lamp
alt Vase Lamp

Interpretation

No misclassified chairs shows that the distribution of chairs is the most distinct compared to that of lamps and vases. Indeed, looking through the dataset, this does seem to be the case visually.

The inter-class confusion between lamp and vase categories can be attributed to some overlap in the distribution of these two categories. For instance, looking at the first two point clouds, even I as a human am inclined to label them as vases. I don't see a bulb or other source of illumination in either point cloud and am left wondering whether this was an incorrect label.

Even in the fourth point cloud, I am almost halfway split between whether this is a vase or a lamp. The bulbous shape of the vase certainly looks like something that may emit light in a lamp. Similarly, the flower/blossom object in the third point cloud looks very similar to a bulb. If it weren't for the leaf behind it, I might also have labelled it a lamp.

Q2. Segmentation Model

Training:

python train.py --task seg

Evaluation (with visualization):

python eval_seg.py --load_checkpoint best_model --visualize

The model checkpoints will be stored in ./checkpoints/seg and visualizations in ./output/seg. Each output gif's prefix is the number of correctly classified points in the point cloud.

Test accuracy of best model: 89.92 %

Good Predictions

Ground Truth Prediction Accuracy
alt alt 99.72 %
alt alt 99.56 %
alt alt 99.53 %
alt alt 99.50 %
alt alt 99.24 %

Bad Predictions

Ground Truth Prediction Accuracy
alt alt 41.52 %
alt alt 54.21 %
alt alt 55.25 %
alt alt 62.49 %

Interpretation

Firstly, all the "bad" predictions seem to be of chairs that are very different from the image one may imagine of a canonical/typical chair, and thus also seem to be outliers with respect to the distribution of chairs in our dataset. Secondly, there seems to be some ambiguity in the definition of the different segments of the chair depending on each instance, which even I as a human have difficulty understanding.

For instance, in the first image, the model predicted the lower half of the chair as "base", whereas the ground truth shows it as part of the armrest. This seems like an unnatural choice to me, because as humans we would be more likely to segment the bottom half of this chair as the base rather than an extension of the armrest.

This phenomenon is again exemplified in the second image, where the model predicts the extent of the armrests to be much lower in the object, but is penalized because the ground truth now calls the lower region a base.

The third point cloud seems like a far outlier because its structure doesn't naturally decompose into a base, headrest, armrest, etc as some of the other chairs in the dataset.

Finally, the fourth point cloud is also an out-of-distribution example as it is a folded chair as opposed to most of the other objects which show a chair in its full expanded extent.

Q3. Robustness Analysis

3.1 Vary number of points per object

Input a different number of points per object (--num_points) when evaluating models than the model was actually trained on.

Experiment 1

Classification model

python eval_cls.py --load_checkpoint best_model --num_points NUM

where NUM is the value (one of those below).

Num Points Accuracy
10000 96.85 %
8000 96.64 %
5000 96.85 %
2500 96.95 %
1000 96.01 %

The accuracy from Q1 is in the first row of the table above.

Experiment 2

Segmentation model

python eval_seg.py --load_checkpoint best_model --num_points NUM

where NUM is the value (one of those below).

Num Points Accuracy
10000 89.92 %
8000 89.92 %
5000 89.93 %
2500 89.66 %
1000 88.74 %

The accuracy from Q2 is in the first row of the table above.

Thus, we conclude that both classification and segmentation models are fairly robust to the number of sampled points.

3.2 Rotate the input point clouds

During evaluation, I rotated the points about the X-axis because the points were fairly spread out along this axis. Rotating about the Y-axis didn't affect the point clouds much (visually) and rotating about the Z-axis showed similar orientation as rotating about the X-axis. The rotation is performed using a pytorch3d transformation object.

Experiment 3

Classification model

python eval_cls.py --load_checkpoint best_model --visualize --rotate DEGREES --exp_name NAME

where DEGREES is how many degrees we want to rotate the input by. The visualizations will be stored in ./output/NAME.

Rotation (degrees) Example input Accuracy
0 alt 96.85 %
15 alt 92.24 %
30 alt 77.54 %
45 alt 54.25 %
60 alt 32.32 %
75 alt 27.91 %
90 alt 27.38 %

The accuracy from Q1 is in the first row of the table above.

Experiment 4

Segmentation model

python eval_seg.py --load_checkpoint best_model --visualize --rotate DEGREES --exp_name NAME

where DEGREES is how many degrees we want to rotate the input by. The visualizations will be stored in ./output/NAME.

Rotation (degrees) Example ground truth Accuracy
0 alt 89.92 %
15 alt 83.40 %
30 alt 70.92 %
45 alt 49.41 %
60 alt 34.50 %
75 alt 30.20 %
90 alt 26.39 %

The accuracy from Q2 is in the first row of the table above.

Thus, we can see that both the classification and segmentation models are robust to rotation of the input point clouds until about 15 degrees, beyond which their accuracies decay rapidly.