16-889 Assignment 2

Byeongjoo Ahn (bahn@andrew.cmu.edu)

1. Exploring loss functions


1.1. Fitting a voxel grid (5 points)

The optimized voxel grid and the ground-truth voxel grid are visualized below. pytorch3d.ops.cubify is used for converting the voxel grid to a mesh for a better visualization of the voxel shape.

1.1. Optimized voxel
1.1. Ground-truth voxel

1.2. Fitting a point cloud (10 points)

The optimized point cloud and the ground-truth point cloud are visualized below.

1.2. Optimized point cloud
1.2. Ground-truth point cloud

1.3. Fitting a mesh (5 points)

The optimized mesh and the ground-truth mesh are visualized below.

1.3. Optimized mesh
1.3. Ground-truth mesh

2. Reconstructing 3D from single view


2.1. Image to voxel grid (15 points)

The input RGB, a render of the predicted voxel grid, and a render of the ground-truth voxel grid are shown below.

2.1. Input RGB
2.1. Predicted voxel grid
2.1. Ground-truth voxel grid

To handle the uneven sample distribution (i.e., difference between the number of positive and negative samples), the loss is normalized using the ratio between the number of occupied voxels and not occupied voxels. batch_size set to 32, and the max_iter is set to 10000. Marching cube is used for the visualization of voxel grid.

2.2. Image to point cloud (15 points)

The input RGB, a render of the predicted point cloud, and a render of the ground-truth point cloud are shown below.

2.2. Input RGB
2.2. Predicted point cloud
2.2 Ground-truth point cloud

batch_size is set to 32, max_iter is set to 10000, and n_points is set to 5000.

2.3. Image to mesh (15 points)

The input RGB, a render of the predicted mesh, and a render of the ground-truth mesh are shown below.

2.3. Input RGB
2.3. Predicted mesh
2.3. Ground-truth mesh

batch_size is set to 32, max_iter is set to 10000, and the initial mesh is set to a sphere.

2.4. Quantitative comparisons (10 points)

The quantitative comparison of F1 score at 0.05 threshold are shown below.

3D representationVoxel gridPoint cloudMesh
F1@0.0565.74996.88995.742

2.5. Analyse effects of hyperparms variations (10 points)

We tuned the parameter of n_points that corresponds to the number of sampled points. We compare the result of three different n_points as shown in the following table and figures. batch_size is set to 8 and max_iter is set to 10000 in this comparison.

n_points10025005000
Precision@0.0552.97893.53493.784
Recall@0.0558.72193.94096.054
F1@0.0555.40993.44294.407

2.5. Ground-truth shape
2.5. Predicted point cloud (n_points=100)
2.5. Predicted point cloud (n_points=100)
2.5. Predicted point cloud (n_points=100)

In this experiment, we changed the architecture of the point cloud accordingly to make the number of points in the point cloud and that of the sampled points be the same (i.e., the output of the point-cloud prediction network has n_points×3\times 3). When we change n_points, we can observe that the quality of the prediction is visually similar and smaller n_point appears to just provide the sparse sampling from the same shape. However, the F1 score is much smaller when n_points=100. We suspect this is because the used threshold (i.e., 0.05) is too small for the case of n_points=100. When we have sufficient number of points, the F1 score does not change as shown in the results of n_points=2500 and n_points=5000. The similar F1 scores also explain n_points does not change the quality of the reconstruction visually. We can conclude that the threshold should be set carefully according to the choose of n_points so that the qualitative number can capture the essential information of the reconstructed shape.

2.6. Interpret your model (15 points)

Goal: To find the correspondence between the initial mesh and the final deformed mesh

Here we analyze how our mesh is deformed from the initial shape to the final shape. Our goal is to see if the mesh is deformed regularly (i.e., the neighboring vertices of each vertex are not changed drastically) so that this network can provide a deformed mesh with a reasonable connectivity. We also want to see the effect of the target topology to the deformation, so we will divide the experiment into two cases as follows: 1) the ground-truth mesh has the same topology as the initial mesh; 2) the ground-truth mesh has a different topology from the initial mesh. As we use a sphere as the initial mesh, the first case corresponds to the shape with the genus number of 0, and the second case corresponds to the shape with a genus number larger than 0.

To visualize the correspondences between the initial mesh and the final mesh, we assign colors to vertices in the initial mesh and use the corresponding color for the each vertex in the deformed final mesh. We use two colormaps: i) continuous color map based on the vertex location; ii) discrete color map for 6 corners of the initial mesh (i.e., maximum and minimum 200 vertices along x, y, z axis). The color maps on the initial sphere are visualized below.

i) Continuous color map
ii) Discrete color map

We visualize the ground-truth mesh, the predicted mesh with the continuous color map, and the predicted mesh with the discrete color map below. The colors indicate the correspondence between the initial sphere mesh and the predicted mesh. We provide the results for each case (i.e., target topology) separately as the following figures.

Ground-truth mesh
Initial mesh (continuous color map)
Deformed mesh (continuous color map)
Initial mesh (discrete color map)
Deformed mesh (discrete color map)

Ground-truth mesh
Initial mesh (continuous color map)
Deformed mesh (continuous color map)
Initial mesh (discrete color map)
Deformed mesh (discrete color map)

From the visualization, we observe the output mesh is not deformed regularly so that the neighboring vertices are drastically changed for both cases. In the visualization of discrete color map, a color is even separated into multiple parts (e.g., cyan in the first row). Although the initial locations are preserved to some degree (e.g., the vertex at the upper part of the sphere remains around the upper part of the chair as shown with the magenta color in the discrete color map), the amount of displacement is not smooth along the vertices and thereby the final mesh has a irregular connectivity. This non-smooth deformation occurs even in the case #1 where the exact deformation to the target shape is available, which implies that more advanced structure for predicting mesh is required for a smooth deformation.

In summary, the deformation of the mesh is irregular even in the case when the target topology and the initial topology are the same. To solve this problem, we can add a regularization term that enforces the smoothness in the amount of displacement with respect to the vertex locations. The current laplacian regularization can enforce the smoothness of the final mesh but it cannot guarantee the smoothness of deformation. The smooth deformation will help reduce artifacts caused by the irregular connectivity (e.g., sharp edge of the chair).

For the case 2 where the initial mesh can never be deformed to the target shape without changing the connectivity because of the different topology, one possible approach would be estimating another representation that allows the topology change such as a implicit surface representation and then construct a mesh using the marching cube algorithm.