16-889 Assignment 2: Single View to 3D
Name: Caroline Ai
1. Exploring loss functions
1.1. Fitting a voxel grid (5 points)
optimized voxel grid |
ground truth voxel grid |
 |
 |
1.2. Fitting a point cloud (10 points)
optimized point cloud |
ground truth point cloud |
 |
 |
1.3. Fitting a mesh (5 points)
optimized mesh |
ground truth mesh |
 |
 |
2. Reconstructing 3D from single view
2.1. Image to voxel grid (15 points)
input RGB |
predicted 3D voxel grid |
ground truth mesh |
 |
 |
 |
 |
 |
 |
 |
 |
 |
2.2. Image to point cloud (15 points)
input RGB |
predicted 3D point cloud |
ground truth mesh |
 |
 |
 |
 |
 |
 |
 |
 |
 |
2.3. Image to mesh (15 points)
input RGB |
predicted mesh |
ground truth mesh |
 |
 |
 |
 |
 |
 |
 |
 |
 |
2.4. Quantitative comparisions(10 points)
|
voxels |
point clouds |
meshes |
loss |
0.115 |
0.002 |
0.007 |
Avg F1 score |
57.354 |
84.564 |
78.078 |
The average F1 score at 0.05 threshold for the voxel grid network is generally lower than that for pointcloud network and mesh network. The voxel grid network uses binary cross entropy loss as the loss function to calculate the difference between the optimized voxel and the ground truth one. Compared with chamfer loss used by pointcloud network and in combination of laplacian smoothening loss used by mesh network, BCE loss are generally larger and thus the average F1 score is much lower.
2.5. Analyse effects of hyperparms variations (10 points)
Holding everything else constant, we can render the predicted meshes by only changing w_chamfer
.
input RGB |
w_chamfer = 0.1 |
w_chamfer = 1 |
w_chamfer = 10 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
We can see that larger w_chamfer
will make the chamfer distance be more dominant when calculating the loss function. When w_chamfer
is larger, the vertices of the meshes are more connected. The average F1 score is the highest when w_chamfer = 1
because the weight is closer to the more reasonable weight. When w_chamfer = 10
, the points are too connected.
Holding everything else constant, we can render the predicted meshes by only changing n_points
.
input RGB |
n_points = 2000 |
n_points = 5000 |
n_points = 10000 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
n_points
affects the number of points sampled from the meshes. From the animations, we can see that when n_points = 5000
, the predictions are more reasonable with a higher average F1 score.
The more n_points
are, the longer it takes to train the model and make predictions.
2.6. Interpret your model (15 points)
We can visualize the predictions with different colors. Here, when the loss is big, the voxels are visualized as blue, and otherwise, as red.
Example 1 |
Example 2 |
Example 3 |
 |
 |
 |
We can see that for this model of voxel grids, the upper right corner of the voxel grids (front-facing the chair) has generally bigger loss. The lower left corner and the back of the voxel grids have smaller loss.