1

Test Accuracy of best model: 97.87%

Random Visualizations:

Failure Cases:

The model overall does pretty well on most categories. The failures are also not horrible: In first case, the chair is folder and so doesn't look like usual chairs. The 2nd is very ambiguous whether its a lamp or vase even for a human. For the third one, the model predicts chair which does not seem at all correct. But maybe its a broad object and matches a lot in terms of size/appearance to chairs. The fourth one again is tough: one can see how it can be a vase as well like the model predicted

2

Test Accuracy: 90.00%

Good Predictions:

LEFT: Ground Truth; RIGHT: Prediction

Failure Cases:

Again the model seems to do a pretty good job, and sometimes even better than ground truth. In the first failure case, it does not predict "head rest". Ground Truth misses one handle though. In the 2nd and 3rd examples, it misses the headrest; and also seems to output wrong labellings for handle. This kind of looks ok to me, I can't really tell if there should be handles in the prediction or not. In the 4th example, ground truth misses the handles while the prediction has them. It might just be because of labelling noise while training: in these sofa chairs some labelled examples seem to label the handrests, while others label everything as a seat.

Robustness Tests

Changing the number of points: We did this experiment by changing the --num_points flag

Number Of Points Classification Segmentation
10000 (original) 0.9789283479284842 0.9004591166936791
8000 0.9779643231899265 0.900402755267423
6000 0.9769150052465897 0.900402755267423
4000 0.9779643231899265 0.9007337925445705
2000 0.9790136411332634 0.8987844408427876
1024 0.9727177334732424 0.8945549913897893
512 0.9653725078698846 0.8860856462722853
256 0.9601259181532005 0.8628570705024311
128 0.9485834207764953 0.8233387358184765
64 0.9307450157397692 0.7833265802269044
32 0.881427072402938 0.7219408427876823
16 0.7586568730325288 0.6559967585089141
8 0.5582371458551941 0.580226904376013

Interpretation: The model seems pretty robust to changing the number of points. Especially it seems we can safely use 1024 points without losing significant performance.

Rotating the PointClouds: We generated a rotation matrix for various angles binned at 30 degrees and rotated the pointcloud using those rotation matrix. To run the code with rotation, you can add --rotate and specify --angle

Number Of Points Classification Segmentation
0 (original) 0.9769150052465897 0.899960453808752
30 0.7292759706190975 0.8011361426256077
60 0.17523609653725078 0.5388372771474879
90 0.8551941238195173 0.1711679092382496
120 0.7460650577124869 0.24139270664505671
150 0.29590766002098634 0.39237341977309564
180 0.3252885624344176 0.6586907617504052
210 0.5299055613850997 0.5931923824959482
240 0.6977964323189927 0.4978306320907617
270 0.23189926547743966 0.308137925445705
300 0.316894018887723 0.46128152350081036
330 0.7785939139559287 0.7071842787682334

Interpretation: The model doesn't seem very robust with respect to rotations. The classification model seems worst at 60 degree rotation. For segmenttaion model, 90 degree is the worst. These results are not that surprising given that our model doesn't have any inductive bias that encourages to be rotation invariant. Trainig with random rotation augmentations might help.

4

We implemented PointNet++ for both segmentation and classification. Following the paper, we implemented SetAbstractionLayer that samples and groups points and applies pointnet in between. For classification, we extract the global features and pass it through an MLP to produce class probabilies. For segmentation, we upsample the downsamples feature map. For doing that, we interpolate features for the original 10000 from the feature maps obtained from each SetAbstractionLayer. We then concatenate all these upsamples features and pass it through MLP to get to the 6 dim class probabilities for each point. For more details, please refer to models.py

Classification:

We get 97.27% with PointNet++ compared to 97.80% from PointNet. Note that we could train for 90 epochs, since PointNet++ was more computationally expensive to train, so we expect to get better performance if we train more.

Here are some visualisations of pointnet++ compared to PointNet

Examples where both are correct:

Examples where pointnet failed but pointnet++ succeeded:

Where both fails:

Segmentation:

We get 89.52% with PointNet++ compared to 90.00% from PointNet. PointNet++ with segmentation is even more heavier, and we only trained it for 13 epochs.

</p>

</p>

Failure Cases:

Late Days: 5!