Late days used: 1
Accuracy on test dataset: 97.38%
Some sample predictions from the trained model:
Correctly predicted chairs
Correctly predicted vases
Correctly predicted lamps
Failure cases from the trained model
Chair predicted as lamp
Lamp predicted as vase
Vase predicted as lamp
Lamp predicted as vase
The model performs well overall with 22 incorrect predictions. The failure cases do make sense, and some of them were confusing to me first as well. For the first prediction, the chair is folded, and the model probably associates a spread point cloud for chairs (since most of the chairs are in fact spread out in the 3D world). For the second case, the object does look like a vase, whereas it's probably an inverted lamp. For the third case, the vase is seated on a pedestal, and the model probably associates a pedestal (or stand-like structure) with the lamp base, which humans would also do.
Accuracy on test dataset: 87.81%
Some correct sample predictions from the trained model:
Sample 1 (Accuracy: 86.87%))
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Sample 2 (Accuracy: 98.07%)
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Sample 3 (Accuracy: 95.84%)
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Sample 4 (Accuracy: 98.71%)
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Failure cases from the trained model
Sample 1 (Accuracy: 67.19%))
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Sample 2 (Accuracy: 48.98%))
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Sample 3 (Accuracy: 51.34%))
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
Sample 4 (Accuracy: 46.32%))
Ground Truth Segmentation | Predicted Segmentation |
---|---|
![]() |
![]() |
The model performs decently well in general. For the first failure case, the base of the chair does look like a seat-like structure which would confuse the model into labelling it as a seat. For the second case, the seat and side handles of the chair are merged together for this particular object. It makes sense that the model would label the entire part as the seat. For the third instance, I think the model actually does a better job in labelling the points (as compared to the ground truth labels). For the fourth instance, one arm of the chair is actually missing, and it seems that the model is biased in predicting arm label in the horizontal edges of the object (and hence the yellow coloured points in the right part of the seat). All these failure cases seem to be unique types of chairs, and it seems reasonable that the model would make such predictions in these cases. Probably having a more diverse set of chair objects in the training dataset would lead to better test predictions.
Number of points | Classification Test Acc. | Segmentation Test Acc. |
---|---|---|
10000 (original expt.) | 97.38% | 87.81% |
1000 | 97.06% | 89.89% |
100 | 83.42% | 81.35% |
Angle of rotation | Classification Test Acc. | Segmentation Test Acc. |
---|---|---|
0 (original expt.) | 97.38% | 87.81% |
30 | 53.62% | 51.35% |
60 | 78.69% | 46.14% |
90 | 39.76% | 24.98% |
For changing the number of points, I used the --num_points argument in the evaluation scripts and set it to 1000 and 100.
For rotating the point cloud, I used the look_at_view_transform
from pytorch3d.renderer.cameras to obtain rotation matrices to rotate the points by a specified degress (specifically, set the elev
argument to the degrees I wanted the point cloud to rotate). To actuallly rotate the point cloud tensors (shape: batch_size X number_of_points X 3), I used pt_tensor @ R[0].T
.
Reducing the number of points from 10000 to 1000 doesn't have much of an effect on the accuracy. This makes sense as the global vector in the PointNet implementation has 1024 dimensions and can theoretically select a maximum of 1024 points to generate a global vector. But reducing the number of points further decreases the accuracy, which makes sense (with reasoning along similar lines as before).
Changing the angle of rotation has a monotonous adverse effect on the segmentation task, suggesting that the model is not robust to angular shifts in point clouds (which makes sense as I didn't have any priors in the model that'd make it rotation invariant). However, the accuracy on the classification task increases slightly when increasing the rotation angle from 30 degrees to 60 degrees, but otherwise decreases when rotation is increases further to 90 degrees.
For this part, I implemented a PointNet++ based classification model. Please refer to the code (cls_model_plus
class in models.py) for details of the model.
I could train the model for 100 epochs only. The best accuracy I obtained in those epochs is 95.38 %. Final train loss value is 17.35.
Here are some predictions from the trained model
Correctly predicted chair
Correctly predicted vase
Correctly predicted lamp
Comparing samples for which PointNet gave wrong predictions
Sample Chair 1:
Prediction from PointNet++ based model: Chair
Prediction from PointNet based model: Lamp
Sample Lamp 1:
Prediction from PointNet++ based model: Lamp
Prediction from PointNet based model: Vase
Sample Vase 1:
Prediction from PointNet++ based model: Vase
Prediction from PointNet based model: Lamp
Note that I've compared samples (IDs: 406, 944, 650) which have been misclassified by the PointNet based model.
Since the model was trained for 100 epochs only, slightly inferior accuracy is expected. Training the model for more, and having bigger cluster sizes would lead to better predictions (and hopefully correct predictions on some more samples which PointNet misclassified).