Homework

Name: Tianyuan Zhang.

andrew id: tianyuaz

Problem 1 - Fitting single 3D shape.

Here I implement 3 types of loss as suggested. And all these loss is directly used in problem2

1.1 Loss for voxels

I use a balanced sigmoid with cross entropy loss. By reweighting the loss of postive samples and negetive samples acrooding to their number of samples.

Fitting results.
Top: predictions -- Bottom: ground truth

1.2 Loss for point

I implement a chamfer distance loss. Since I didn't implement it using knn, I implement it using pair-wise distances. So my implementation costs more memeory:

Fitting results.
Top: predictions -- Bottom: ground truth

1.3 Loss for meshes.

The code template has implemented points sampling for us. So we only need to implement a laplacian smoothing loss.

To implement a laplacian smoothing loss, we need access the ajacent matrixes, which is a sparse matrix.

My implementation directlly used pytorch3d

Fitting results.
Top: predictions -- Bottom: ground truth

Problem2 Single Image shape prediction

2.1 Image to Voxel grid

We use an archtecture of 4 upsamping ConvTranspose3D and one reshape upsampling to recover the voxels.

The main problem for predicting voxels is class imblance. We have way more negative samples.

So I reweight the loss to balance the learning.

F1 scores: 54.672

Visualization: Top: predictions -- Bottom: ground truth

2.2 Image to pointclouds

The regression head to pointclouds is to prediction a tensor of shape [N, 3], where N represents the number of points.

We use a bunch of conv-bn-relu and finnaly a Linear layer to do this.

The point cloud model is very easy to converge.

F1 score: 90.924

Visualization: Top: predictions -- Bottom: ground truth

2.3 Image to meshes

F1 score: 89.639 Visualization: Top: predictions -- Bottom: ground truth

2.4 Comparison

representationvoxelpointmesh
F1 score \@ 0.0554.67290.92489.639

Pointclouds representation has the highest F1 score, while voxel has the lowest.
The reason that the way we compute F1 score is not actually computing the distance between tow illegel shapes! We are just sampling points. thus, pointclouds has the highest capacity or say freedom to approximate target shape without getting too much constrainted.

And for voxels fileds, it is hard to form a leagel voxel fileds to represent shape at first place, and to compute f1 score, we need to convert it to meshes then sampling points. Too much pipeline can reduce the performance, making the optimization hard.

2.5 Hyper-param varitions.

  1. training iterations. I find that longer training iterations can improve the performance of voxels greatly.
representationvoxelpointmesh
F1 score \@ 0.0540.07 => 54.67290.924 => 94.51089.639 => 93.158

2.6 New interpretaions of the models.

I want to set a new, simple metrics to reveal the bias of the model, which is that the model is doing some average of the shape in the train dataset.

Maybe average is not good description, but interpolation the shapes of similar images.

In other worlds, the model does not has the capacity to directly regress the shape of target objects. So it just average the shape of similar images in the train set.

So I will compute the F1 score of the model on the shape generated by two different images.

I sample 600 pairs of images in the validation set, the get the shape predictions of the two images from the model, then test the F1 score between the two images, then average.

representationvoxelpointmeshGroundTruth mesh
F1 score \@ 0.0553.74578.16976.57451.502

So, we can conclude from the metrics that, the point cloud and mesh predictor is kind of overfitting the training data!