Q1. Classification Model (40 points)
Training loss and testing accuracy curves visualization
Figure layout:
Left figure is the
training loss tensorboard curve.
Right figure is the
testing accuracy tensorboard curve output.
Implementation Description (MLP):
The architecture was implemented as specified in the PointNet paper by Qi et al. (2017).
Deliverables:
The test accuracy of our best model is 97.90%
Visualization of correct predictions:
Visualization of correct prediction of chairs from our best model.
Visualization of correct prediction of vase from our best model.
Visualization of correct prediction of lamp from our best model.
Visualization of incorrect predictions:
Groundtruth for this sample is a chair, but our best model predicted this sample as a lamp
Groundtruth for this sample is a lamp, but our best model predicted this sample as a vase
Groundtruth for this sample is a vase, but our best model predicted this sample as a lamp
Interpretation
Overall, the classification model achieves reasonable accuracy of
97.90%
. Hence, total number of incorrect predictions in the test set are just 20 frames. The incorrect predictions looks reasonable as well. For example, the first incorrect prediction shows that our best model network predicted a lamp instead of a chair. The points in this sample most closely resemble to the overall structure of lamp where the distribution of points mostly lies in the vertical height direction and where the stretch in the horizontal direction is very meagre. The same goes for the other two incorrect predictions. Since the PointNet model tries to associate each prediction with the one it has encountered in the training set, the incorrect predictions does make sense, in that the third incorrerct prediction is deceiving even for human eye at a first glance!
Q2. Segmentation Model (40 points)
Training loss and testing accuracy curves visualization
Figure layout:
Left figure is the
training loss tensorboard curve.
Right figure is the
testing accuracy tensorboard curve output.
Implementation Description (MLP):
The architecture was implemented as specified in the PointNet paper by Qi et al. (2017).
Deliverables:
The test accuracy of our best model is 90.23%
Visualization of correct segmentations:
Figure layout:
Left figure is the
groundtruth segmentation.
Right figure is the
predicted segmentation output.
Correct segmentation object #111 from our best model.
Accuracy: 99.48%
Correct segmentation object #215 from our best model.
Accuracy: 99.47%
Correct segmentation object #434 from our best model.
Accuracy: 99.45%
Correct segmentation object #462 from our best model.
Accuracy: 99.33%
Correct segmentation object #562 from our best model.
Accuracy: 99.32%
Visualization of less accurate segmentations:
Figure layout:
Left figure is the
groundtruth segmentation.
Right figure is the
predicted segmentation output.
Relatively less accurate segmentation object #26 from our trained best model.
Accuracy: 45.06%
Relatively less accurate segmentation object #351 from our trained best model.
Accuracy: 50.68%
Interpretation
Visual inspection conveys the fact the incorrect segmentation makes sense. For the first inaccurate segmentation output, at a first glance, it is not immediately obvious if the chair has legs or if the seat is large extending till the bottom. As for the other inaccurate segmentation output, it is apparently difficult even for our raw human vision to decide if the structure resembles legs. Furthermore, the pillow inside the chair looks tough to segment as well. Such cases are generating inaccurate segmentations due to the fact that the training dataset does not have enough data resembling these cases.
Q3. Robustness Analysis (20 points)
Experiment 1: Rotating Pointclouds (10 points)
Using
look_at_view_transform
, I rotate and translate the pointclouds by changing the distance, elevation, and azimuthal degrees. This generates a rotation matrix and translation matrix which I use to rotate and/or translate the test point cloud. To get context, I roughly turned the object around flipping its back to the front.
Effect on Classification
If I just rotate the pointcloud with random rotations, the accuracy decreases from
97.90%
to 38.50%
Furthermore, if I rotate as well as tranlate the test pointcloud, the accuracy further reduces to
24.44%
Effect on Segmentation
Similarly, the accuracy of our model reduced from
90.23%
to
45.2%
and
44.5%
when I rotated, and rotated+translated the test point cloud respectively.
45.2 and 44.5
Interpretation
Above experiments suggest that the learned model is only able to perform reasonably well if the input pointcloud is not perturbed (transformed) by a large amount. In other words, there seems to be a substantial drop in accuracy with higher angle of deviations. One way to tackle this and make the model more robust would be if the data is augmented with such random rotations and translations during the training.
Experiment 2: Changing the number of points per object (10 points)
This experiment relatively retained the accuracy of the existing trained best model. In this experiment, we changed the number of points in the test pointcluds
10000
to
1000
and
100
Effect on Classification @ 1000 points
The accuracy of classification retrieves its value, i.e. the score just drops
from 97.90%
to 97.48%
Effect on Classification @ 100 points
The score now drops from
97.90%
to 90.12%
Effect on Segmentation @ 1000 points
Seeing a similar trend for segmentation as well, the accuracy of our model almost retrieves its original value, i.e. the score just drops
from 90.22%
to 89.27%
Effect on Classification @ 100 points
As we saw above, the score now susbtantially from
90.12%
to 77.13%
Interpretation
This experiment shows that the PointNet architecture is able to retain the accuracy inspite of reducing the number of points, however substantilly reducing the number of points (~100) would kill the structural information that could be reasoned by the learned model.