The test accuracy from the best classification model is 97.9%. Some correctly classifie examples are shown below:
Vase
Chair
Lamp
Some incorrectly classified examples are:
True label: Chair
Predicted Label: Lamp
True label: Lamp
Predicted Label: Vase
True label: Vase
Predicted Label: Lamp
Interpretation:
The model performs classification very well with a final test accuracy of 97.9%. Most samples in the text set are correctly classified. As seen above, the 3 misclassified samples are rather hard cases. The chair is observed at an angle that makes it confusing to discern from a lamp. Similarly, the vase is a hard example as it doesn't resemble a typical vase. The network ends up classifying it as a lamp. Although, the second example of the lamp is somewhat simpler and would look easier to classify, the network falters here probably because of the lamp base which somewhat resembles a vase.
The test accuracy from the best model is 89.72%. The ground truth and outputs from 5 test samples are shown below:
Sample 1 - Good segmentation
Segmentation accuracy: 93.5
Sample 2 - Good segmentation
Segmentation accuracy: 98.5
Sample 3 - Bad segmentation
Segmentation accuracy: 72.7
Sample 4 - Good segmentation
Segmentation accuracy: 82.3
Sample 5 - Good segmentation
Segmentation accuracy: 74.2
Interpretation:
The segmentation model performs decently well on the entire test set giving an accuracy of 89.7%. However, it struggles in segmenting parts of certain chairs in the test set. In the examples provided above, sample 1, 2 and 4 were well-segmented by the network. They do indeed look like simpler chairs for the network to understand the parts of. Their individual segmentation accuracies (82-98) also indicate that the network has donne a good job in predicting the per-pixel classes. However, sample 3 and 5 were poorly segmented by the network and this is seen both visually and by the individual segmenttaion accuracy values. Both of them have a segmentation accuracy in the 70s. In sample 3, the network is unable to properly distinguish between the back-rest and the base (cyan and red points overlap). In sample 5, the network is unable to properly distinguish between the base and the legs with the red part being much larger in the predicted cloud than in the ground truth cloud.
Experiment 1:
I tested the Classification and segmentation models with different values of num_points. The results are summarized below:
num_points | Classification accuracy | Segmentation Accuracy |
---|---|---|
100 | 92.7 | 83.66 |
500 | 97.58 | 89.01 |
1000 | 97.37 | 89.70 |
1500 | 97.69 | 89.88 |
As can be seen in the table above, increasing the number of points per object helps increase both the classification and segmentation accuracies.
Experiment 2:
I tested the Classification and segmentation models on rotated point clouds. The point clouds were rotated by 90 degrees along the z axis.
Classification accuracy: 27.3
Segmentation accuracy: 34.16
Outputs from classification:
Ground truth: Chair
Predicted: Vase
Ground truth: Chair
Predicted: Vase
Outputs from segmentation:
Ground truth
Segmentation prediction
Ground truth
Segmentation prediction
I have implemented the PointNet++ architecture for classification.
Procedure: In addition to using Linear layers, the PointNet++ architecture makes use of set abstraction levels where each set abstraction extracts a new point set with fewer features. For each set of linear layers which use set abstraction, I have implemented code to sample points from the point cloud which serve as centroids. For each of these centroids, I sample 50 points using Pytorch3D's ball query function. This set of points is then passed through the linear layers.
This is the network structure:
SA(512, 0.2, [64, 64, 128]) → SA(128, 0.4, [128, 128, 256]) → SA([256, 512, 1024]) → FC(512, 0.5) → FC(256, 0.5) → FC(3)
The classification accuracy of PointNet++ was 95.47%
Although this network is expected to perform better than the vanilla network, I have obtained a lower accuracy. This, I believe, is due to my lower num_points (1500) value as I had GPU constraints while testing the network.
[NbConvertApp] Converting notebook index.ipynb to html [NbConvertApp] Writing 2077219 bytes to index.html