1) Test accuracy of best model: 0.9627
2) Visualizations
Class 0 (Chair):
Correctly predicted examples
Incorrectly predicted example - predicted as Lamp (class 2). This makes sense because unlike the correctly predicted two chairs above, the base of this chair is folded inwards and thereby this chair may be seen as an outlier to the classification network
Class 1 (Vase):
Correctly predicted examples
Incorrectly predicted example - predicted as Lamp (class 2). This incorrect prediction is also not so suprising as the vase itself is symmetrical with respect to the z-axis (up-axis) just as is the stem of a typical lamp. The presence of the dangling flower in the vase could have been miscontrued by the classification network as the bulb of a lamp, thereby tipping the prediction in favor of "Lamp" as opposed to "Vase".
Class 2 (Lamp):
Correctly predicted examples
Incorrectly predicted example - predicted as Vase (class 1). This is an extremely difficult pointcloud to classify because the pointcloud is situated upside down. In this orientation, even a human being may not be able to discern the correct identity of the underlying object, and may justifiably classify the pointcloud as a vase.
1) Test accuracy of best model: 0.8861
2) Visualizations
good segmentations (prediction accuracy > 0.95)
(left: predictions, right: corresponding gt)
prediction accuracy: 0.973
prediction accuracy: 0.9695
prediction accuracy: 0.9666
prediction accuracy: 0.9562
bad segmentations (prediction accuracy < 0.70)
(left: predictions, right: corresponding gt)
prediction accuracy: 0.4926
prediction accuracy: 0.5141
prediction accuracy: 0.6067
prediction accuracy: 0.6394
Interpretation:
We display four objects that have the highest prediction accuracies and four objects that have the lowest prediction accuracies. It can be seen that the four objects that have the highest prediction accuracies have similar structures, both in terms of its geometry and segmentation ground truth - all four of these objects only have three segments out of the possible six. We hypothesize that this kind of structure is the most dominant mode in the dataset and therefore the segmentation network performs well on these types of objects.
On the other hand, the four objects that have the lowest prediction accuracies have wildly different geometric structures and are annotated by the maximum possible number of segments. The segmentation predictions appear to be biased towards the segmentation predictions observed for the high-prediction objects, where the backrest of the chair is assigned to the class 'cyan', the mid-rim of the chair is assigned to the class 'red', and the legs of the chair is assigned to the class 'dark blue'. The network seems to be unable to properly predict other distinct chair segments such as the side arm in object 1 and object3 (in yellow), the leg rest in object 2 (in white), and the extremely long legs in object 4 (in dark blue), which we hypothesize to be out-of-distribution compared to the dominant modes of the dataset.
In this experiment, we analyze the robustness of our learned classification and segmentation models to the number of sampled points in the input point cloud. We compute the classification and segmentation accuracy when we use 1, 1/2, 1/4, 1/8, 1/16 of the original number of points (= 10000). As shown in the following figure, it can be seen that the classification model is very robust to the number of sampled points; even when we use 1/16 of the points in the original point cloud, we only incur less than a 1% drop in classification accuracy.
What is even more surprising is how the segmentation accuracy changes w.r.t. to the number of sampled points. The segmentation accuracy actually increases (rather than decreasing) with respect to the missing data ratio. Since the network weights are already learned and fixed in this experiment, this increasee in segmentation accuracy is most likely attributed to the computation of better global features during the max pooling stage; somehow a coarser input pointcloud results in better global features for the purpose of 3D semantic segmentation.
In this experiment, we analyze the robustness of our learned classification and segmentation models rotations about an axis applied to the input point cloud. In other words, we are analyzing whether or not our models have learned any rotation invariance / equivariance, for classification and segmentation, respectively. To this end, we compute the classification and segmentation accuracy for varying angles of rotation $\theta$ (0, 30, 60, 90, 120, 150, 180 degrees) about a randomly sampled unit vector $\hat{\omega} \in \mathbb{R}^{3} = [0.2595, 0.0749, -0.9628]^{\top}$ that represents the rotation axis. We compute the rotation matrix via Rodrigues rotation formula:
$$\textrm{Rot}(\hat{\omega}, \theta) = e^{[\hat{\omega}]\theta} = I + \textrm{sin}\theta[\hat{\omega}] + (1 - \textrm{cos}\theta)[\hat{\omega}]^{2}$$where
$$[x] = \left[\begin{array}{ccc} 0 & -x_{3} & x_{2} \\ x_{3} & 0 & -x_{1} \\ -x_{2} & x_{1} & 0 \end{array}\right]$$We can see that the learned classification and segmentation models is not as robust to input rotations as they were to the number of sampled points in the input pointcloud. This demonstrates that the network has failed to learn rotation invariance / equivariance (at least for the dataset that we've been training on) and therefore we need a different architecture if we want to guarantee this paricular inductive bias.