Q1. Classification Model (40 points)

Training loss and testing accuracy curves visualization

Figure layout:

Left figure is the training loss tensorboard curve.
Right figure is the testing accuracy tensorboard curve output.
Target
Optimized

Implementation Description (MLP):

The architecture was implemented as specified in the PointNet paper by Qi et al. (2017).

Deliverables:

The test accuracy of our best model is 97.90%

Visualization of correct predictions:

Visualization of correct prediction of chairs from our best model.
Target
Optimized

Visualization of correct prediction of vase from our best model.
Target
Optimized

Visualization of correct prediction of lamp from our best model.
Target
Optimized

Visualization of incorrect predictions:

Groundtruth for this sample is a chair, but our best model predicted this sample as a lamp
Target


Groundtruth for this sample is a lamp, but our best model predicted this sample as a vase
Target

Groundtruth for this sample is a vase, but our best model predicted this sample as a lamp
Target

Interpretation

Overall, the classification model achieves reasonable accuracy of 97.90% . Hence, total number of incorrect predictions in the test set are just 20 frames. The incorrect predictions looks reasonable as well. For example, the first incorrect prediction shows that our best model network predicted a lamp instead of a chair. The points in this sample most closely resemble to the overall structure of lamp where the distribution of points mostly lies in the vertical height direction and where the stretch in the horizontal direction is very meagre. The same goes for the other two incorrect predictions. Since the PointNet model tries to associate each prediction with the one it has encountered in the training set, the incorrect predictions does make sense, in that the third incorrerct prediction is deceiving even for human eye at a first glance!


Q2. Segmentation Model (40 points)

Training loss and testing accuracy curves visualization

Figure layout:

Left figure is the training loss tensorboard curve.
Right figure is the testing accuracy tensorboard curve output.
Target
Optimized

Implementation Description (MLP):

The architecture was implemented as specified in the PointNet paper by Qi et al. (2017).

Deliverables:

The test accuracy of our best model is 90.23%

Visualization of correct segmentations:

Figure layout:

Left figure is the groundtruth segmentation.
Right figure is the predicted segmentation output.
Correct segmentation object #111 from our best model.
Accuracy: 99.48%
Target
Optimized

Correct segmentation object #215 from our best model.
Accuracy: 99.47%
Target
Optimized

Correct segmentation object #434 from our best model.
Accuracy: 99.45%
Target
Optimized

Correct segmentation object #462 from our best model.
Accuracy: 99.33%
Target
Optimized

Correct segmentation object #562 from our best model.
Accuracy: 99.32%
Target
Optimized

Visualization of less accurate segmentations:

Figure layout:

Left figure is the groundtruth segmentation.
Right figure is the predicted segmentation output.
Relatively less accurate segmentation object #26 from our trained best model.
Accuracy: 45.06%
Target
Optimized

Relatively less accurate segmentation object #351 from our trained best model.
Accuracy: 50.68%
Target
Optimized


Interpretation

Visual inspection conveys the fact the incorrect segmentation makes sense. For the first inaccurate segmentation output, at a first glance, it is not immediately obvious if the chair has legs or if the seat is large extending till the bottom. As for the other inaccurate segmentation output, it is apparently difficult even for our raw human vision to decide if the structure resembles legs. Furthermore, the pillow inside the chair looks tough to segment as well. Such cases are generating inaccurate segmentations due to the fact that the training dataset does not have enough data resembling these cases.


Q3. Robustness Analysis (20 points)


Experiment 1: Rotating Pointclouds (10 points)

Using look_at_view_transform , I rotate and translate the pointclouds by changing the distance, elevation, and azimuthal degrees. This generates a rotation matrix and translation matrix which I use to rotate and/or translate the test point cloud. To get context, I roughly turned the object around flipping its back to the front.

Effect on Classification

If I just rotate the pointcloud with random rotations, the accuracy decreases from 97.90% to 38.50% Furthermore, if I rotate as well as tranlate the test pointcloud, the accuracy further reduces to 24.44%

Effect on Segmentation

Similarly, the accuracy of our model reduced from 90.23% to 45.2% and 44.5% when I rotated, and rotated+translated the test point cloud respectively. 45.2 and 44.5

Interpretation

Above experiments suggest that the learned model is only able to perform reasonably well if the input pointcloud is not perturbed (transformed) by a large amount. In other words, there seems to be a substantial drop in accuracy with higher angle of deviations. One way to tackle this and make the model more robust would be if the data is augmented with such random rotations and translations during the training.

Experiment 2: Changing the number of points per object (10 points)

This experiment relatively retained the accuracy of the existing trained best model. In this experiment, we changed the number of points in the test pointcluds 10000 to 1000 and 100

Effect on Classification @ 1000 points

The accuracy of classification retrieves its value, i.e. the score just drops from 97.90% to 97.48%

Effect on Classification @ 100 points

The score now drops from 97.90% to 90.12%

Effect on Segmentation @ 1000 points

Seeing a similar trend for segmentation as well, the accuracy of our model almost retrieves its original value, i.e. the score just drops from 90.22% to 89.27%

Effect on Classification @ 100 points

As we saw above, the score now susbtantially from 90.12% to 77.13%

Interpretation

This experiment shows that the PointNet architecture is able to retain the accuracy inspite of reducing the number of points, however substantilly reducing the number of points (~100) would kill the structural information that could be reasoned by the learned model.