Assigment 5
1. Classification Model
The test accuracy of my best model is 93.17%.
The misclassification objects are largely confused because of global similarity with another class. For instance, the lamps do look like plausible vases (or vice versa) when solely considering the general shape. Small details such a foilage in vases is not captured very well.
For class label “chair”, there were only 2 misclassified examples. The point cloud on the left is classified as “lamp”, and the one on the right is classified as “vase”.
![]() |
![]() |
For class label vase,
Misclassified as chair: |
![]() |
![]() |
|
Misclassified as lamp: |
![]() |
![]() |
![]() |
For class label lamp,
Misclassified as chair: |
![]() |
||
Misclassified as vase: |
![]() |
![]() |
![]() |
2. Segmentation Model
The test accuracy of my best model is 87.78%. The last two rows are the failure cases of our model. The model gets confused with the extent of arms and the seat.
Some of these issues could be due to the lack of texture and image imformation. For example in the failure case 1, it segments the lower part of the chair as the seat, however GT segmentation contains both seat and legs. This implies that the legs are probably made of a different material compared to the seat, which could only be obtained from texture information.
GT | Pred |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
3. Robustness Analysis
Experiment-1
We check the effect of rotating the points along the x-axis on the classification models.
The local model (DGCNN) was trained on 1024 points instead of 10000 points in the vanilla case, due to the memory requirements of computing gradients of the network. We notice that the vanilla segmentation model is more robust than the local model to the rotation. This makes sense as the locality estimates would drastically change with the rotation and thus affect the model a lot more.
Rotation Angle | Cls | Cls (Local) |
---|---|---|
0 | 93.17 | 96.60 |
15 | 88.03 | 94.64 |
30 | 80.63 | 81.00 |
45 | 69.57 | 55.71 |
60 | 54.98 | 23.18 |
Experiment-2
We check the effect of changing the number of points as input to the segmentation models.
The local model (DGCNN) was trained on 1024 points instead of 10000 points in the vanilla case, due to the memory requirements of computing gradients of the network. We notice that the vanilla segmentation model is more robust than the local model to the number of points.
Num. Points | Seg | Seg (Local) |
---|---|---|
10000 | 87.78 | 80.36 |
5000 | 87.86 | 83.41 |
2000 | 87.96 | 89.36 |
1000 | 87.85 | 90.08 |
4. Bonus Question - Locality
I implemented the Edge Convolution defined by DG-CNN (Dynamic Graph CNN for Learning on Point Clouds). And replaced my convolutions with the Edge Convolution operator. However, the operations have higher memory requirements and thus I trained the model on 1024 points.
class dgcnn_mixin:
def knn(self, x, k):
D = torch.cdist(x, x)
_, idx = torch.topk(D, k=k, dim=-1, largest=False)
return idx
def edge_conv(self, conv, x):
B, N, F = x.shape
device = x.device
# Perform knn and get k nearest indices
idx = self.knn(x, self.k)
base = torch.arange(0, B, device=device).unsqueeze(-1).unsqueeze(-1)*N
idx = idx + base
idx = idx.view(-1)
# construct feature
# cat(nearest_x - x, x)
x = x.contiguous()
feature = x.view(B*N, -1)[idx, :]
feature = feature.view(B, N, self.k, F)
x = x.unsqueeze(2).repeat(1, 1, self.k, 1)
feature = torch.cat([feature-x, x], dim=-1)
feature = feature.permute(0, 3, 1, 2)
# Apply Shared MLP
out = conv(feature)
# Take max pool
out, _ = torch.max(out, dim=-1, keepdim=False)
out = out.permute(0, 2, 1)
return out
The test accuracy of my best classfication model is 96.6%, compared to the baseline version’s accuracy of 93.17%. The local model is less confused between lamp vs chairs, and vases vs chairs. The confusion is still present between lamps and vases.
For class label “chair”, there was only 1 misclassified example. This point cloud is classified as “lamp”.
![]() |
For class label vase,
Misclassified as chair: |
![]() |
||
Misclassified as lamp: |
![]() |
![]() |
![]() |
For class label lamp,
Misclassified as chair: |
![]() |
||
Misclassified as vase: |
![]() |
![]() |
![]() |
The test accuracy of my best segmentation model is 89.99%, compared to the baseline version’s accuracy of 87.78%. We can notice some local discrepancies such as arms of the couch are better classified by the local model than the vanilla model. The last two rows are the failure cases of our model.
GT | Pred (Local) | Pred (Vanilla) |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |