Assignment 5

Anirudh Chakravarthy (achakrav)

Question 1

Usage:

python train.py --task cls
python eval_cls.py --load_checkpoint best_model

Test accuracy: 97.16%

Class Correct Wrong
Chair Chair true Chair false
Vase Vase true Vase false
Lamp Lamp true Lamp false

Question 2

Usage:

python train.py --task seg
python eval_seg.py --load_checkpoint best_model

Test accuracy: 89.86%

Description Pred GT Acc
Good Seg good pred 1 Seg good gt 1 0.9949
Good Seg good pred 2 Seg good gt 2 0.9845
Good Seg good pred 3 Seg good gt 3 0.9823
Bad Seg bad pred 1 Seg bad gt 1 0.4435
Bad Seg bad pred 2 Seg bad gt 2 0.4714
Bad Seg bad pred 3 Seg bad gt 3 0.4991

The segmentation network seems to perform well on examples with few prominent classes. In the first 3 examples, there are a very few substructures (3-4) within each point cloud and the network segments these large regions well. However, when there exist multiple substructures i.e., the object becomes more cluttered, the network does not perform well, perhaps due to structural confusions around those regions.

Question 3

Experiment 1

Usage:

python eval_cls.py --load_checkpoint best_model --noise
python eval_seg.py --load_checkpoint best_model --noise

I added uniformly sampled noise in the range [0, alpha] to each point in the point cloud. For high values of alpha, this corrupts the point cloud completely, but for low values, this gives us an idea of the robustness of the network towards random shifts in points.

Cls:

Alpha Accuracy
0 97.16%
0.1 94.54%
0.2 91.71%
0.5 58.38%
1 22.24%

Seg:

Alpha Accuracy
0 89.86%
0.1 86.61%
0.2 79.61%
0.5 48.71%
1 21.31%

Experiment 2

Usage:

python eval_cls.py --load_checkpoint best_model --dropout
python eval_seg.py --load_checkpoint best_model --dropout

I was inspired to perform this experiment from pixel attribution techniques and the following blog: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html. Specifically, each point has a saliency associated with it, given by the norm of the gradient with respect to that point. If a point has a high gradient norm, changing this point would lead to large change in the outputs, which in turn signifies that the point is very crucial to the output.

For classification, I computed the gradient of the classification prediction with respect to each of the points. For the points which have non-zero gradient norms, I discarded the top k% among these points. Intuitively, on discarding more and more of the important points, the performance will decrease.

k Accuracy
0% 97.16%
10% 97.06%
20% 96.95%
30% 97.06%
50% 97.06%
70% 97.16%
80% 97.16%

Even on discarding a large portion of the most crucial points, the network still retains a respectable accuracy. Therefore, the network seems to be robust.

For segmentation, I followed a similar process where the gradients are computed for the per-point prediction with respect to each point. Intuitively, the points with high gradient norm are those which are brittle to changes and small changes to the input would change the segmentation outputs, i.e. the gradient norm tells us how brittle the segmentation output is with respect to the given point.

k Accuracy
0% 89.86%
10% 89.83%
20% 89.82%
30% 89.92%
50% 90.31%
70% 90.18%
80% 90.05%

Clearly, removing these points leads to a slight increase in output. Since the increment is very slight, this means that performance on these points is also usually pretty good and therefore, the network is fairly robust.