Name: Chih-Wei Wu
Email: chihweiw@andrew.cmu.edu
Here, I visualize several test point clouds and their corresponding predicted class. The model could get most of the prediction right.
Still, there are some data that the network predict it wrong. Here, I provide a failure case for each class. It turns out these samples are quite difficult to classify, even for a human. In the first sample, the tall chair looks like a floor lamp, such as the one shown above. For the second sample, we couldn't really tell whether it's a lamp or a elegantly-shaped vase. As for the third one, the lamp looks pretty much like a lamp. In short, I think we couldn't really blame the network for predicting wrong.
Here, I visualize several test point clouds and their segmentation result. On the left is the ground truth, on the right is prediction. The model pretty much nail the prediction, though it get some small detail wrong. For example, the model over-extend the chair arm (yellow) prediction into the seating (red) in the first example. This is because the boundary of chair arm and seating is fundamentally very difficult to localize. In the second example, the seating is predicted wrong, as the model extend the seating far down into the chair leg part (blue). Possibly because this chair has very different styled leg, not the typical pillar like chair leg. The third example is pretty much correct.
Here, I provide some samples that have low accuracy down below. On the left is the ground truth, on the right is prediction.
The first example is pretty bad, but the ground truth is a rather difficult example for segmentation. The heading position (magenta) is connected with chair back, which is why the network get it wrong. The left chair arm (yellow) is not really obvious even for human. The chair leg (blue) only account for a small area. No wonder the model get it wrong so badly.
The second example is also a difficult one. First, whether the pillow count as seating or chair back is ambiguous. Second, the chair leg is actuaclly an extension of chair back and chair arm. The model has a difficult time segmenenting a single shape into multiple categories.
First, I analyze how the model would behave under different number of points. I perform this experiment by subsampling a number of points from each sample.
Classification
# of points | accuracy |
---|---|
10000 | 0.9780 |
5000 | 0.9370 |
2000 | 0.9391 |
1000 | 0.9339 |
500 | 0.9391 |
200 | 0.9328 |
100 | 0.9129 |
Segmentation
# of points | accuracy |
---|---|
10000 | 0.9036 |
5000 | 0.9033 |
2000 | 0.9023 |
1000 | 0.8965 |
500 | 0.8832 |
200 | 0.8468 |
100 | 0.8142 |
In theory, decreasing the number of points would break the shape of the object, and decrease the accuracy. We could see this trend in both classification and segmentation model. But to my surprise, the accuracy holds pretty well if the point are decreased to ~1000. This shows that the model is robust in both classication and segmentation regardless of the points sampled for an object.
Next, I analyze the model robustness to rotation. I perform this experiment by rotating all objects with certain amount of angles around x, y, z axis. The results are shown below.
Classification (rotate around x-axis)
rotate angle | accuracy |
---|---|
0 | 0.9780 |
30 | 0.6884 |
60 | 0.2046 |
90 | 0.2854 |
Segmentation (rotate around x-axis)
rotate angle | accuracy |
---|---|
0 | 0.9780 |
30 | 0.7959 |
60 | 0.5273 |
90 | 0.2406 |
Classification (rotate around y-axis)
rotate angle | accuracy |
---|---|
0 | 0.9780 |
30 | 0.9087 |
60 | 0.7597 |
90 | 0.4764 |
Segmentation (rotate around y-axis)
rotate angle | accuracy |
---|---|
0 | 0.9036 |
30 | 0.7843 |
60 | 0.6603 |
90 | 0.5633 |
Classification (rotate around z-axis)
rotate angle | accuracy |
---|---|
0 | 0.9780 |
30 | 0.8573 |
60 | 0.7282 |
90 | 0.2812 |
Segmentation (rotate around z-axis)
rotate angle | accuracy |
---|---|
0 | 0.9036 |
30 | 0.6910 |
60 | 0.5339 |
90 | 0.3820 |
In theory, the pointnet-like model is suppose to be rotation invariant. But because the assignment does not require us to implement the T-net that the original author use to deal with rotation, the network has poor performance to handle rotation. The accuracy drops significantly when the rotation angle increases. Also, segmentation tends to drop more accuracy than classification. This is because classification is a task that does not depend on the localization of points, while segmentation need to interpret the position of each point. Last thing, rotating around z-axis drops less performance. This is possibly because the dataset contain lots of symmetric object, therefore some performance is retain from this fact.