Test Accuracy of best model: 97.87%
Random Visualizations:
Predicted Vases
Predicted Lamps
Failure Cases:
Prediction: Lamp; Ground Truth: Chair
Prediction: Lamp; Ground Truth: Vase
Prediction: Chair; Ground Truth: Vase
Prediction: Vase; Ground Truth: Lamp
The model overall does pretty well on most categories. The failures are also not horrible: In first case, the chair is folder and so doesn't look like usual chairs. The 2nd is very ambiguous whether its a lamp or vase even for a human. For the third one, the model predicts chair which does not seem at all correct. But maybe its a broad object and matches a lot in terms of size/appearance to chairs. The fourth one again is tough: one can see how it can be a vase as well like the model predicted
Test Accuracy: 90.00%
Good Predictions:
LEFT: Ground Truth; RIGHT: Prediction
Accuracy: 99.00%
Accuracy: 98.94%
Accuracy: 98.07%
Failure Cases:
Accuracy: 69.34%
Accuracy: 76.37%
Accuracy: 80.63%
Again the model seems to do a pretty good job, and sometimes even better than ground truth. In the first failure case, it does not predict "head rest". Ground Truth misses one handle though. In the 2nd and 3rd examples, it misses the headrest; and also seems to output wrong labellings for handle. This kind of looks ok to me, I can't really tell if there should be handles in the prediction or not. In the 4th example, ground truth misses the handles while the prediction has them. It might just be because of labelling noise while training: in these sofa chairs some labelled examples seem to label the handrests, while others label everything as a seat.
Changing the number of points: We did this experiment by changing the --num_points flag
Number Of Points | Classification | Segmentation |
---|---|---|
10000 (original) | 0.9789283479284842 | 0.9004591166936791 |
8000 | 0.9779643231899265 | 0.900402755267423 |
6000 | 0.9769150052465897 | 0.900402755267423 |
4000 | 0.9779643231899265 | 0.9007337925445705 |
2000 | 0.9790136411332634 | 0.8987844408427876 |
1024 | 0.9727177334732424 | 0.8945549913897893 |
512 | 0.9653725078698846 | 0.8860856462722853 |
256 | 0.9601259181532005 | 0.8628570705024311 |
128 | 0.9485834207764953 | 0.8233387358184765 |
64 | 0.9307450157397692 | 0.7833265802269044 |
32 | 0.881427072402938 | 0.7219408427876823 |
16 | 0.7586568730325288 | 0.6559967585089141 |
8 | 0.5582371458551941 | 0.580226904376013 |
Interpretation: The model seems pretty robust to changing the number of points. Especially it seems we can safely use 1024 points without losing significant performance.
Rotating the PointClouds: We generated a rotation matrix for various angles binned at 30 degrees and rotated the pointcloud using those rotation matrix. To run the code with rotation, you can add --rotate
and specify --angle
Number Of Points | Classification | Segmentation |
---|---|---|
0 (original) | 0.9769150052465897 | 0.899960453808752 |
30 | 0.7292759706190975 | 0.8011361426256077 |
60 | 0.17523609653725078 | 0.5388372771474879 |
90 | 0.8551941238195173 | 0.1711679092382496 |
120 | 0.7460650577124869 | 0.24139270664505671 |
150 | 0.29590766002098634 | 0.39237341977309564 |
180 | 0.3252885624344176 | 0.6586907617504052 |
210 | 0.5299055613850997 | 0.5931923824959482 |
240 | 0.6977964323189927 | 0.4978306320907617 |
270 | 0.23189926547743966 | 0.308137925445705 |
300 | 0.316894018887723 | 0.46128152350081036 |
330 | 0.7785939139559287 | 0.7071842787682334 |
Interpretation: The model doesn't seem very robust with respect to rotations. The classification model seems worst at 60 degree rotation. For segmenttaion model, 90 degree is the worst. These results are not that surprising given that our model doesn't have any inductive bias that encourages to be rotation invariant. Trainig with random rotation augmentations might help.
We implemented PointNet++ for both segmentation and classification. Following the paper, we implemented SetAbstractionLayer that samples and groups points and applies pointnet in between. For classification, we extract the global features and pass it through an MLP to produce class probabilies. For segmentation, we upsample the downsamples feature map. For doing that, we interpolate features for the original 10000 from the feature maps obtained from each SetAbstractionLayer. We then concatenate all these upsamples features and pass it through MLP to get to the 6 dim class probabilities for each point. For more details, please refer to models.py
Classification:
We get 97.27% with PointNet++ compared to 97.80% from PointNet. Note that we could train for 90 epochs, since PointNet++ was more computationally expensive to train, so we expect to get better performance if we train more.
Here are some visualisations of pointnet++ compared to PointNet
Examples where both are correct:
Examples where pointnet failed but pointnet++ succeeded:
Where both fails:
Pointnet++ Prediction: Lamp; Pointnet Prediction: Lamp; Ground Truth: Vase
Pointnet++ Prediction: Chair; PointNet Prediction: Lamp; Ground Truth: Vase
Segmentation:
We get 89.52% with PointNet++ compared to 90.00% from PointNet. PointNet++ with segmentation is even more heavier, and we only trained it for 13 epochs.
</p>
Accuracy: 98.54%
Accuracy: 97.65%
</p>
Failure Cases:
Accuracy: 67.46%
Accuracy: 77.03%
Accuracy: 81.30%