Training:
python train.py --task cls
Evaluation (with visualization):
python eval_cls.py --load_checkpoint best_model --visualize
The model checkpoints will be stored in ./checkpoints/cls
and visualizations in ./output/cls
.
Input Point Cloud | Correct Prediction |
---|---|
![]() |
Lamp |
![]() |
Lamp |
![]() |
Vase |
![]() |
Vase |
![]() |
Chair |
![]() |
Chair |
Interestingly, all the chairs in the test set were correctly classified by the model. The only misclassified examples were from the lamp and vase categories.
Input Point Cloud | Ground Truth | Prediction |
---|---|---|
![]() |
Lamp | Vase |
![]() |
Lamp | Vase |
![]() |
Vase | Lamp |
![]() |
Vase | Lamp |
No misclassified chairs shows that the distribution of chairs is the most distinct compared to that of lamps and vases. Indeed, looking through the dataset, this does seem to be the case visually.
The inter-class confusion between lamp and vase categories can be attributed to some overlap in the distribution of these two categories. For instance, looking at the first two point clouds, even I as a human am inclined to label them as vases. I don't see a bulb or other source of illumination in either point cloud and am left wondering whether this was an incorrect label.
Even in the fourth point cloud, I am almost halfway split between whether this is a vase or a lamp. The bulbous shape of the vase certainly looks like something that may emit light in a lamp. Similarly, the flower/blossom object in the third point cloud looks very similar to a bulb. If it weren't for the leaf behind it, I might also have labelled it a lamp.
Training:
python train.py --task seg
Evaluation (with visualization):
python eval_seg.py --load_checkpoint best_model --visualize
The model checkpoints will be stored in ./checkpoints/seg
and visualizations in ./output/seg
. Each output gif's prefix is the number of correctly classified points in the point cloud.
Ground Truth | Prediction | Accuracy |
---|---|---|
![]() |
![]() |
99.72 % |
![]() |
![]() |
99.56 % |
![]() |
![]() |
99.53 % |
![]() |
![]() |
99.50 % |
![]() |
![]() |
99.24 % |
Ground Truth | Prediction | Accuracy |
---|---|---|
![]() |
![]() |
41.52 % |
![]() |
![]() |
54.21 % |
![]() |
![]() |
55.25 % |
![]() |
![]() |
62.49 % |
Firstly, all the "bad" predictions seem to be of chairs that are very different from the image one may imagine of a canonical/typical chair, and thus also seem to be outliers with respect to the distribution of chairs in our dataset. Secondly, there seems to be some ambiguity in the definition of the different segments of the chair depending on each instance, which even I as a human have difficulty understanding.
For instance, in the first image, the model predicted the lower half of the chair as "base", whereas the ground truth shows it as part of the armrest. This seems like an unnatural choice to me, because as humans we would be more likely to segment the bottom half of this chair as the base rather than an extension of the armrest.
This phenomenon is again exemplified in the second image, where the model predicts the extent of the armrests to be much lower in the object, but is penalized because the ground truth now calls the lower region a base.
The third point cloud seems like a far outlier because its structure doesn't naturally decompose into a base, headrest, armrest, etc as some of the other chairs in the dataset.
Finally, the fourth point cloud is also an out-of-distribution example as it is a folded chair as opposed to most of the other objects which show a chair in its full expanded extent.
Input a different number of points per object (--num_points
) when evaluating models than the model was actually trained on.
The accuracy from Q1 is in the first row of the table above.
The accuracy from Q2 is in the first row of the table above.
During evaluation, I rotated the points about the X-axis because the points were fairly spread out along this axis. Rotating about the Y-axis didn't affect the point clouds much (visually) and rotating about the Z-axis showed similar orientation as rotating about the X-axis. The rotation is performed using a pytorch3d transformation object.
python eval_cls.py --load_checkpoint best_model --visualize --rotate DEGREES --exp_name NAME
where DEGREES
is how many degrees we want to rotate the input by. The visualizations will be stored in ./output/NAME
.
Rotation (degrees) | Example input | Accuracy |
---|---|---|
0 | ![]() |
96.85 % |
15 | ![]() |
92.24 % |
30 | ![]() |
77.54 % |
45 | ![]() |
54.25 % |
60 | ![]() |
32.32 % |
75 | ![]() |
27.91 % |
90 | ![]() |
27.38 % |
The accuracy from Q1 is in the first row of the table above.
python eval_seg.py --load_checkpoint best_model --visualize --rotate DEGREES --exp_name NAME
where DEGREES
is how many degrees we want to rotate the input by. The visualizations will be stored in ./output/NAME
.
Rotation (degrees) | Example ground truth | Accuracy |
---|---|---|
0 | ![]() |
89.92 % |
15 | ![]() |
83.40 % |
30 | ![]() |
70.92 % |
45 | ![]() |
49.41 % |
60 | ![]() |
34.50 % |
75 | ![]() |
30.20 % |
90 | ![]() |
26.39 % |
The accuracy from Q2 is in the first row of the table above.