Assignment 4
Name: Mayank Agarwal
Andrew ID: mayankag
Late Days Used: 0
Q1. Classification Model (40 points)
# Training Command
python train.py --task cls
Accuracy
# Evaluation Command
python eva_cls.py
Test Accuracy of best model: 97.69%
Visualizations (Correct Predictions)
Point Cloud | Ground Truth | Prediction |
---|---|---|
![]() |
Chair | Chair |
![]() |
Chair | Chair |
![]() |
Chair | Chair |
![]() |
Vase | Vase |
![]() |
Vase | Vase |
![]() |
Vase | Vase |
![]() |
Lamp | Lamp |
![]() |
Lamp | Lamp |
![]() |
Lamp | Lamp |
Visualizations (Failure Cases)
Point Cloud | Ground Truth | Prediction |
---|---|---|
![]() |
Chair | Lamp |
![]() |
Vase | Lamp |
![]() |
Vase | Lamp |
![]() |
Vase | Lamp |
![]() |
Lamp | Vase |
![]() |
Lamp | Vase |
![]() |
Lamp | Vase |
Observations
Only one chair object was misclassified as lamp as shown in the above visualizations. This sample does not have a seating platform and could be easily confused for not being a chair. The model also gets confused between classifying lamps and vases. Lamps are frequently confused for vases and vice-versa. This means that overall, chair is vastly different from lamp and vase categories and is easier to classify. Another reason for good accuracy on chairs class might be data imbalance. I observed that we have more training data for chairs than for other classes. On the other hand, there’s a lot of similarity in the structure and shape of lamps and vases, which might be the cause for the model’s confusion. Compared to chairs, there’s more diversity in lamps and vases, another cause for the network’s confusion.
Q2. Segmentation Model (40 points)
# Training Command
python train.py --task seg
Accuracy
# Evaluation Command
python eva_seg.py
Test Accuracy of best model: 90.35%
Visualizations (Most Accurate Predictions)
Prediction (left) v/s Ground Truth (right) | Accuracy |
---|---|
![]() |
99.56% |
![]() |
99.55% |
![]() |
99.54% |
Visualizations (Least Accurate Predictions)
Prediction (left) v/s Ground Truth (right) | Accuracy |
---|---|
![]() |
42.36% |
![]() |
46.75% |
![]() |
48.74% |
Observations
Above, I have visualized the most accurate and least accurate predictions separately. As we can see from the above, well-defined chairs (closer to mean chair) have clear part segmentation of back, armrest, seat, legs, etc. These are easier to segment, and have the highest prediction accuracy as depicted in the first table. On the other hand, more complicated chairs with loosely defined handles, backrest and seats have poor segmentation accuracy. Given the inherent ambiguity, it is difficult even for humans to correctly segment such designer chairs (or sofas), and the same can be expected from the network as well.
Q3. Robustness Analysis (20 points)
I have performed the following two robustness analysis experiments -
- I have evaluated the point clouds on fewer number of input points (points are randomly selected).
- I have rotated the point clouds (about the z-axis) by certain degrees and observed its effects on classification and segmentation.
Robustness Analysis for Classification
python eval_cls.py --num_points NUM_POINTS --rot ROTATION_DEGREE
Accuracy from Q1 is highlighted in each of the tables below (first row).
Changing number of points for evaluation (keeping rotation fixed)
Number of points per object | Rotation (in degrees) | Test Accuracy (Best model) |
---|---|---|
10000 | 0 | 97.69 |
5000 | 0 | 97.80 |
2500 | 0 | 97.69 |
1250 | 0 | 97.69 |
625 | 0 | 96.75 |
312 | 0 | 96.54 |
156 | 0 | 94.65 |
78 | 0 | 87.20 |
40 | 0 | 67.47 |
16 | 0 | 32.53 |
Interpretation
As we decrease the number of points, the accuracy drops by a small margin till we decrease the num_points to 312. It then substantially decreases as we decrease the number of points even further. This is also expected, since 156 points are not sufficient to define the geometry of the objects. As we see in the failure cases below, it is difficult even for humans to classify these objects correctly. For e.g. not enough points on the seats (a key feature) of the chairs are sampled, making it hard for the network to classify them correctly.
Visualizing failure cases
For very few points (num_points = 156), the model’s predictions have become more incorrect. We observe some Chair objects are incorrectly predicted as Vase or Lamp. These chairs were correctly classified when more number of points are used.
Point Cloud | Ground Truth | Prediction |
---|---|---|
![]() |
Chair | Vase |
![]() |
Chair | Lamp |
![]() |
Chair | Lamp |
On further decreasing the number of input points (num_points=78), some vases are incorrectly predicted as chairs.
Point Cloud | Ground Truth | Prediction |
---|---|---|
![]() |
Vase | Chair |
Changing rotation (keeping number of points fixed)
Number of points per object | Rotation (in degrees) | Test Accuracy (Best model) |
---|---|---|
10000 | 0 | 97.69 |
10000 | 5 | 97.17 |
10000 | -5 | 97.17 |
10000 | 10 | 96.54 |
10000 | -10 | 95.59 |
10000 | 20 | 84.58 |
10000 | -20 | 91.08 |
10000 | 30 | 51.10 |
10000 | -30 | 76.18 |
10000 | 90 | 23.82 |
10000 | -90 | 24.24 |
Interpretation
From the above results, we can see that the network is sensitive to rotation and there’s a considerable drop in classification accuracy if we rotate the point clouds by 20 degrees (along the z-axis). This is also intuitive since a chair once rotated might be closer to the mean lamp orientation (which has some hanging artifacts). I have also visualized a few failure cases below.
Visualizing failure cases
When we rotate the point cloud by 20 degrees, we observe some chairs are now being misclassified as lamps. These chairs were correctly classified when orientation was axis-aligned.
Point Cloud | Ground Truth | Prediction |
---|---|---|
![]() |
Chair | Lamp |
![]() |
Chair | Lamp |
![]() |
Chair | Lamp |
Robustness Analysis for Segmentation
python eval_seg.py --num_points NUM_POINTS --rot ROTATION_DEGREE
Accuracy from Q2 is highlighted in each of the tables below (first row).
Changing number of points for evaluation (keeping rotation fixed)
Number of points per object | Rotation (in degrees) | Test Accuracy (Best model) | Best Prediction (Prediction-Left v/s Ground Truth-Right) | Worst Prediction (Prediction-Left v/s Ground Truth-Right) |
---|---|---|---|---|
10000 | 0 | 90.35 | ![]() |
![]() |
5000 | 0 | 90.31 | ![]() |
![]() |
2500 | 0 | 90.12 | ![]() |
![]() |
1250 | 0 | 89.67 | ![]() |
![]() |
625 | 0 | 88.40 | ![]() |
![]() |
312 | 0 | 85.87 | ![]() |
![]() |
156 | 0 | 82.14 | ![]() |
![]() |
78 | 0 | 78.48 | ![]() |
![]() |
40 | 0 | 75.31 | ![]() |
![]() |
16 | 0 | 72.18 | ![]() |
![]() |
Interpretation
Given the above results, I would say segmentation is robust to number of input points. Even with very few points, the segmentation model is correctly predicting the segmentation outputs for well-defined chairs with clearly visible parts (see second last column). As we decrease the number of points, we see that the model still performs well on well-defined chairs. But, it struggles with segmentation of difficult chairs with ambiguous segments.
Changing rotation (keeping number of points fixed)
Number of points per object | Rotation (in degrees) | Test Accuracy (Best model) | Best Prediction (Prediction-Left v/s Ground Truth-Right) | Worst Prediction (Prediction-Left v/s Ground Truth-Right) |
---|---|---|---|---|
10000 | 0 | 90.35 | ![]() |
![]() |
10000 | 5 | 89.35 | ![]() |
![]() |
10000 | -5 | 89.39 | ![]() |
![]() |
10000 | 10 | 86.54 | ![]() |
![]() |
10000 | -10 | 86.72 | ![]() |
![]() |
10000 | 20 | 78.36 | ![]() |
![]() |
10000 | -20 | 77.99 | ![]() |
![]() |
10000 | 30 | 70.28 | ![]() |
![]() |
10000 | -30 | 69.58 | ![]() |
![]() |
10000 | 90 | 43.85 | ![]() |
![]() |
10000 | -90 | 41.21 | ![]() |
![]() |
Interpretation
Upto a certain degree, segmentation model is robust to rotation as well. However, it’s accuracy considerably decreases when we rotate the point clouds by 90 degrees. The model seems to be assigning part labels solely based on their location and using less structural information. It tries to classify top as backrest (light blue), middle as seat (red), and bottom as legs (dark blue) even for the rotated point clouds. This is also expected because the model wasn’t trained for these transformations during training time. A good fix would be apply rotation augmentations during training.