Assignment 5

Paritosh Mittal (paritosm)

Late Days used : 5 days

1 Classification Model

Test accuracy of the best model :: 96.22%

Correct prediction visualizations

Chair Chair Chair Chair Chair
GT: Chair Pred: Chair GT: Chair Pred: Chair GT: Chair Pred: Chair GT: Chair Pred: Chair GT: Chair Pred: Chair
Chair Chair Chair Chair Chair
GT: Vase Pred: Vase GT: Vase Pred: Vase GT: Vase Pred: Vase GT: Vase Pred: Vase GT: Vase Pred: Vase
Chair Chair Chair Chair Chair
GT: Lamp Pred: Lamp GT: Lamp Pred: Lamp GT: Lamp Pred: Lamp GT: Lamp Pred: Lamp GT: Lamp Pred: Lamp

Incorrect prediction visualizations

Chair Chair Chair Chair
GT: Chair Pred: Vase GT: Vase Pred: Chair GT: Vase Pred: Chair GT: Vase Pred: Lamp
Chair Chair Chair Chair
GT: Lamp Pred: Vase GT: Lamp Pred: Vase GT: Lamp Pred: Vase GT: Lamp Pred: Vase

In the visualizations above, I notice that pointnet does a pretty reasonable job at predicting object classes given point clouds. On visualizing the failure cases, it is evident that such cases are quite tricky. Example, the first row, first column of a chair predicted as a vase actually has its geomerty very similar to the vases in the dataset. Similarly, there is quite variance in the Lamp class and these funky shapes confuse the model (as expected).

2 Segmentation Model

Test accuracy of the best model :: 90.10%

Good quality prediction visualizations

Ground Truth Chair Chair Chair Chair Chair
Predictions Chair Chair Chair Chair Chair
Accuracy 99.96% 99.28% 99.36% 99.55% 99.68%

Bad quality prediction visualizations

Ground Truth Chair Chair Chair Chair Chair
Predictions Chair Chair Chair Chair Chair
Accuracy 51.55% 50.87% 50.97% 53.84% 54.06%

Here I visualize five good and five bad predictions for pointcloud segmentation. Quantitatively the model does a good job performing segmentation. For failure cases, it is evident from the visualization that these chairs have quite different structure when compared to general notion of chairs. There is also no clear boundaries between legs and armrests (Column II, III, IV, and V). This confuses the model in a reasonable way. I arguably find the GT in Row III to be wrong and the prediction to actually be closer to my personal belief of correct segmentation. This ambiguity results in poor qualitative results.

3 Robustness Analysis

3.1 Analysis with respect to points

Here I consider the change in model performance with respect to number of points in a point cloud. I specify an argument --do3.1 in the eval scripts. Once invoked, I compute the test accuracy for objects represented by {100, 500, 1000, 5000, 10000} points.

For classification:

num points 100 500 1000 5000 10000 (Q1)
Accuracy 93.28% 95.69% 96.33% 96.33% 96.22%

I notice that accuracy does not fall considerably. This could be beacuse we use a global max pool and hence we do not need all points to detect an object category. Correct predictions can be made as long as we have enough evidence from critical locations that help dis-ambiguate between classes. Hence, for classification the numbers align with expectations.

For Segmentation:

num points 100 500 1000 5000 10000 (Q2)
Accuracy 78.44% 86.81% 88.87% 90.03% 90.10%

I notice that accuracy does fall considerably. This is mainly because with sparse points, effective segmentation is difficult as there is more ambiguity. Isolated points have less evidence from neighbors (global context) to enforce accuract predictions. I also notice that segmentation accuracy (per-point) improves as we increase samples. Hence, even for segmentation the numbers align with expectations.

3.2 Analysis with respect to rotations

Here I consider the change in model performance with respect to rotation of point cloud along Z axis.. I specify an argument --do3.2 in the eval scripts. Once invoked, I compute the test accuracy for objects rotated by {0, 15, 30, 45, 60, 90} degree angles.

For classification:

Rotation Angle 0 (Q1) 15 30 45 60 90
Accuracy 96.22% 90.76% 73.24% 50.26% 31.58% 20.56%

It is evident from the numbers that pointnet is not rotation invariant and there is significant reduction in performance with increase in rotation. For small angles, the drop is less (as expected). I suspect data augmentation can somewhat improve the performance along with estimating rotation to map points into a canonical axis aligned form.

For Segmentation:

Rotation Angle 0 (Q2) 15 30 45 60 90
Accuracy 90.10% 81.95% 69.11% 59.52% 50.69% 42.62%

It is evident from the numbers that pointnet is not rotation invariant for segmentation task and there is significant reduction in performance with increase in rotation.

Ground Truth Chair Chair Chair Chair Chair Chair
Predictions Chair Chair Chair Chair Chair Chair
Rotations 0 15 30 45 60 90

To ensure that rotations are being done correctly, I visualize for one shape the view of rotated object and it's prediction as we increase rotation.

4 Bonus Question

Here, I implement the DGCNN model for point cloud classification and segmentation. Specifically, I use the pytorch Geometric library (a library similar to Pytorch3d maintained for Graph Neural Networks) for this implementation. The model can be found in the dgcnn folder. Key changes include re-writing the dataloader with pytorchGeometric Dataloader, which includes 3D point positions as an object of Data class. This particular implementation takes a lot of GPU memory and hence I reduce the model's complexity for fast(er) training. The model is built using the DynamicEdgeConv blocks (each having MLP layers within).

For classification:

Model PointNet DGCNN
Accuracy 96.22% 97.59%
Visualization - Easy Chair Chair Chair
GT Chair Vase Lamp
PointNet Chair Vase Lamp
DGCNN Chair Vase Lamp
Visualization - Difficult Chair Chair Chair
GT Chair Vase Lamp
PointNet Vase Lamp Vase
DGCNN Chair Lamp Lamp

I observe that DGCNN gives slight improvement in test accuracy. Qualitatively, a few difficult cases (as visualized above) where pointnet fails, dgcnn is able to go a correct job in predicting the class.

For segmentation:

Model PointNet DGCNN
Accuracy 90.10% 89.49%

I observe that DGCNN slightly decreases the accuracy of segmentation. I believe this can be because I reduced the model complexity of dgcnn for faster processing.