Name: Tianyuan Zhang.
andrew id: tianyuaz
Here I implement 3 types of loss as suggested. And all these loss is directly used in problem2
I use a balanced sigmoid with cross entropy loss. By reweighting the loss of postive samples and negetive samples acrooding to their number of samples.
x
def voxel_loss(voxel_src,voxel_tgt):
# loss =
# implement some loss for binary voxel grids
# prob_loss = torch.nn.functional.binary_cross_entropy_with_logits(voxel_src, voxel_tgt)
# prob_loss = torchvision.ops.sigmoid_focal_loss(voxel_src, voxel_tgt, reduction='mean', alpha=0.25)
# prob_loss = prob_loss.mean(dim=0)
# prob_loss = prob_loss.sum()
pos_mask = torch.where(voxel_tgt == 1)
neg_mask = torch.where(voxel_tgt == 0)
loss_pos = F.binary_cross_entropy(voxel_src[pos_mask], voxel_tgt[pos_mask])
loss_neg = F.binary_cross_entropy(voxel_src[neg_mask], voxel_tgt[neg_mask])
prob_loss = loss_pos * 0.5 + loss_neg * 0.5
return prob_loss
Fitting results.
Top: predictions -- Bottom: ground truth
I implement a chamfer distance loss. Since I didn't implement it using knn, I implement it using pair-wise distances. So my implementation costs more memeory:
x
def chamfer_loss(point_cloud_src,point_cloud_tgt):
# loss_chamfer =
# implement chamfer loss from scratch
dist_mat = torch.cdist(point_cloud_src, point_cloud_tgt)
loss_1 = dist_mat.min(dim=-1)[0].mean()
loss_2 = dist_mat.min(dim=1)[0].mean()
loss = loss_1 * 0.5 + loss_2 * 0.5
return loss
Fitting results.
Top: predictions -- Bottom: ground truth
The code template has implemented points sampling for us. So we only need to implement a laplacian smoothing loss.
To implement a laplacian smoothing loss, we need access the ajacent matrixes, which is a sparse matrix.
My implementation directlly used pytorch3d
def smoothness_loss(mesh_src):
# loss =
# implement laplacian smoothening loss
loss_laplacian = mesh_laplacian_smoothing(mesh_src, method='uniform')
return loss_laplacia
Fitting results.
Top: predictions -- Bottom: ground truth
We use an archtecture of 4 upsamping ConvTranspose3D and one reshape upsampling to recover the voxels.
The main problem for predicting voxels is class imblance. We have way more negative samples.
So I reweight the loss to balance the learning.
F1 scores: 54.672
Visualization:
Top: predictions -- Bottom: ground truth
The regression head to pointclouds is to prediction a tensor of shape [N, 3]
, where N represents the number of points.
We use a bunch of conv-bn-relu
and finnaly a Linear
layer to do this.
The point cloud model is very easy to converge.
F1 score: 90.924
Visualization:
Top: predictions -- Bottom: ground truth
F1 score: 89.639
Visualization:
Top: predictions -- Bottom: ground truth
representation | voxel | point | mesh |
---|---|---|---|
F1 score \@ 0.05 | 54.672 | 90.924 | 89.639 |
Pointclouds representation has the highest F1 score, while voxel has the lowest.
The reason that the way we compute F1 score is not actually computing the distance between tow illegel shapes! We are just sampling points. thus, pointclouds has the highest capacity or say freedom to approximate target shape without getting too much constrainted.
And for voxels fileds, it is hard to form a leagel voxel fileds to represent shape at first place, and to compute f1 score, we need to convert it to meshes then sampling points. Too much pipeline can reduce the performance, making the optimization hard.
representation | voxel | point | mesh |
---|---|---|---|
F1 score \@ 0.05 | 40.07 => 54.672 | 90.924 => 94.510 | 89.639 => 93.158 |
I want to set a new, simple metrics to reveal the bias of the model, which is that the model is doing some average of the shape in the train dataset.
Maybe average is not good description, but interpolation the shapes of similar images.
In other worlds, the model does not has the capacity to directly regress the shape of target objects. So it just average the shape of similar images in the train set.
So I will compute the F1 score of the model on the shape generated by two different images.
I sample 600 pairs of images in the validation set, the get the shape predictions of the two images from the model, then test the F1 score between the two images, then average.
representation | voxel | point | mesh | GroundTruth mesh |
---|---|---|---|---|
F1 score \@ 0.05 | 53.745 | 78.169 | 76.574 | 51.502 |
So, we can conclude from the metrics that, the point cloud and mesh predictor is kind of overfitting the training data!