[16-889] Learning for 3D Vision

PointNet for classification and segmentation

Q1. Classification Model

Test accuracy: 93.18%

Correct predictions

Ground-truth: Chair

000_pred_0_gt_0.0 003_pred_0_gt_0.0

Ground-truth: Vase

007_pred_1_gt_1.0

Ground-truth: Lamp

001_pred_2_gt_2.0 004_pred_2_gt_2.0 005_pred_2_gt_2.0

Wrong predictions

Ground-truth: Chair | Prediction: Lamp

212_pred_2_gt_0.0

Ground-truth: Vase | Predicted: Lamp

039_pred_2_gt_1.0

Ground-truth: Lamp | Predicted: Vase

041_pred_1_gt_2.0 039_pred_2_gt_1.0

Interpretation of results

The correct predictions all show similar semantic structures within each category.

The incorrect predictions show interesting insights. From the test data samples, the chair category has one false positive. In another sense, with the absence of a common structure like 4 legs or a Cesca chair like bottom, this specific object is less representative of the chair category and hence was misclassified.

The test samples of the vase category that were misclassifed as lamps have roughly vertical structures resembling of many objects in the lamp class. This reasoning makes sense for the false positives.

The test samples of the lamp category that were misclassified as vase have these vase-like structures resembling of many objects in the vase class. This reasoning makes sense for the false positives.

Q2. Segmentation Model

Test accuracy: 89.5%

Top-5 predictions

Left: Prediction | Right: Ground-truth

pred_297_9938

Accuracy: 99.55%

pred_297_9938

Accuracy: 99.55%

pred_297_9938

Accuracy: 99.42%

pred_297_9938

Accuracy: 99.38%

pred_297_9938

Accuracy: 99.35%

Bottom-5 predictions

Left: Prediction | Right: Ground-truth

pred_297_9938

Accuracy: 43.67%

pred_297_9938

Accuracy: 47.11%

pred_297_9938

Accuracy: 47.66%

pred_297_9938

Accuracy: 48.99%

pred_297_9938

Accuracy: 49.34%

Interpretation of results

The top-5 predictions have very clear segmentable chair-like structures - back, legs, etc. Hence, these samples have very high segmentation accuracy. In fact all the samples do not differ much from their rough skeleton structure.

But in the bottom-5 predictions, the pointclouds have esoteric segmentations. The first ground-truth pointcloud is very noisy so the accuracy is expected to be bad. The second one has an extra object as part of the chair category which is not present in other examples of the chair category. The next three samples have parts that are not distinct. So the separation of one class from another is not clear and hence the model has a hard to time segmenting it correctly.

Q3. Robustness Analysis

Pointclouds rotation

The input pointclouds were rotated by degrees ranging from 0 to 360 along the Z-axis. The accuracy of the model across all test samples was calculated for each rotation. Below shows the plots for the classification and segmentation task.

Classification task

task3_pcl_rotate_cls

Segmentation task

task3_pcl_rotate_seg

The accuracy for both tasks follow expected performance with varying rotation degrees: performance degrades until the model is essentially upside down (180 deg.) and is improves from there on. The segmentation task was only performed on the chair category and there does not seem to be any spikes in performance for some unexpected orientations of the pointcloud. But this is not the case for the classification task. For a rotation of around 180 deg., there is a small spike in performance. This required further investigation and the per-category performance for this ablation is shown below:

task3_pcl_rot_sep_cls

There is a spike for the lamp and chair categories. Since most of the lamp structures are roughly vertical structures, a rotation of 180 deg approximately preserves symmetry about the x-axis and hence the performance improves in this range by a bit. This reasoning does not hold for the vase category and hence has an expected drop in performance in this range.

A surprising finding is that the chair class shows an improvement when rotated upside down. A related observation is that the chair class has extremely few false positives. This indicates that the model has learned unique features for the chair category that makes it easier to classify. The chair components that have roughly the same orientation are its legs and the back of the chair. These unique features allow the network to faithfully classify the chair category when orientated in the 180 deg rotation.

Pointcloud resolution

The input pointcloud was sampled to have a fixed number of points per test input pointcloud ranging from 100 to 10000.

Classification task

task3_pcl_num_samples_cls

Segmentation task

task3_pcl_num_samples_seg

Both the plots show expected trends : as the resolution of the pointcloud improves, the accuracy in both tasks improves. When the number of points are less, it is harder to discriminate between different classes. With increase in the number of points, the shape of the object becomes clearer and easier to perform the respective tasks.