16-889 Assignment 2
1. Exploring loss functions
1.1. Fitting a voxel grid (5 points)
The optimized voxel grid and the ground-truth voxel grid are visualized below. pytorch3d.ops.cubify
is used for converting the voxel grid to a mesh for a better visualization of the voxel shape.
1.2. Fitting a point cloud (10 points)
The optimized point cloud and the ground-truth point cloud are visualized below.
1.3. Fitting a mesh (5 points)
The optimized mesh and the ground-truth mesh are visualized below.
2. Reconstructing 3D from single view
2.1. Image to voxel grid (15 points)
The input RGB, a render of the predicted voxel grid, and a render of the ground-truth voxel grid are shown below.
To handle the uneven sample distribution (i.e., difference between the number of positive and negative samples), the loss is normalized using the ratio between the number of occupied voxels and not occupied voxels. batch_size
set to 32, and the max_iter
is set to 10000. Marching cube is used for the visualization of voxel grid.
2.2. Image to point cloud (15 points)
The input RGB, a render of the predicted point cloud, and a render of the ground-truth point cloud are shown below.
batch_size
is set to 32, max_iter
is set to 10000, and n_points
is set to 5000.
2.3. Image to mesh (15 points)
The input RGB, a render of the predicted mesh, and a render of the ground-truth mesh are shown below.
batch_size
is set to 32, max_iter
is set to 10000, and the initial mesh is set to a sphere.
2.4. Quantitative comparisons (10 points)
The quantitative comparison of F1 score at 0.05 threshold are shown below.
3D representation | Voxel grid | Point cloud | Mesh |
F1@0.05 | 65.749 | 96.889 | 95.742 |
- The voxel-grid prediction performs the worst and one of the main reason would be the limited resolution of the voxel grid. The resolution of is not sufficient to represent the shape of the chair especially when there are multiple holes on the chair (i.e., genus number larger than 0).
- The point-cloud prediction performs the best (almost the same as the mesh) as there is no limitation in the resolution as in the case of the point cloud. The point-cloud representation is also free from the change of the topology, which partly explains why the point cloud works slightly better then the mesh representation.
- The mesh prediction performs almost the same as the point-cloud prediction. The used architecture is perfectly the same as the point-cloud case but it works slightly worse than the point-cloud case partly because there are artifacts because of the incorrect connectivity. In particular, the topology cannot be changed from the initial mesh, and it causes the reduce in the F1 score.
2.5. Analyse effects of hyperparms variations (10 points)
We tuned the parameter of n_points
that corresponds to the number of sampled points. We compare the result of three different n_points
as shown in the following table and figures. batch_size
is set to 8 and max_iter
is set to 10000 in this comparison.
n_points | 100 | 2500 | 5000 |
---|---|---|---|
Precision@0.05 | 52.978 | 93.534 | 93.784 |
Recall@0.05 | 58.721 | 93.940 | 96.054 |
F1@0.05 | 55.409 | 93.442 | 94.407 |
In this experiment, we changed the architecture of the point cloud accordingly to make the number of points in the point cloud and that of the sampled points be the same (i.e., the output of the point-cloud prediction network has n_points
). When we change n_points
, we can observe that the quality of the prediction is visually similar and smaller n_point
appears to just provide the sparse sampling from the same shape. However, the F1 score is much smaller when n_points
=100. We suspect this is because the used threshold (i.e., 0.05) is too small for the case of n_points
=100. When we have sufficient number of points, the F1 score does not change as shown in the results of n_points
=2500 and n_points
=5000. The similar F1 scores also explain n_points
does not change the quality of the reconstruction visually. We can conclude that the threshold should be set carefully according to the choose of n_points
so that the qualitative number can capture the essential information of the reconstructed shape.
2.6. Interpret your model (15 points)
Goal: To find the correspondence between the initial mesh and the final deformed mesh
Here we analyze how our mesh is deformed from the initial shape to the final shape. Our goal is to see if the mesh is deformed regularly (i.e., the neighboring vertices of each vertex are not changed drastically) so that this network can provide a deformed mesh with a reasonable connectivity. We also want to see the effect of the target topology to the deformation, so we will divide the experiment into two cases as follows: 1) the ground-truth mesh has the same topology as the initial mesh; 2) the ground-truth mesh has a different topology from the initial mesh. As we use a sphere as the initial mesh, the first case corresponds to the shape with the genus number of 0, and the second case corresponds to the shape with a genus number larger than 0.
To visualize the correspondences between the initial mesh and the final mesh, we assign colors to vertices in the initial mesh and use the corresponding color for the each vertex in the deformed final mesh. We use two colormaps: i) continuous color map based on the vertex location; ii) discrete color map for 6 corners of the initial mesh (i.e., maximum and minimum 200 vertices along x, y, z axis). The color maps on the initial sphere are visualized below.
We visualize the ground-truth mesh, the predicted mesh with the continuous color map, and the predicted mesh with the discrete color map below. The colors indicate the correspondence between the initial sphere mesh and the predicted mesh. We provide the results for each case (i.e., target topology) separately as the following figures.
- Case #1: The ground-truth mesh has the same topology as the initial mesh (target genus number = 0)
- Case #2: The ground-truth mesh has a different topology from the initial mesh (target genus number > 0)
From the visualization, we observe the output mesh is not deformed regularly so that the neighboring vertices are drastically changed for both cases. In the visualization of discrete color map, a color is even separated into multiple parts (e.g., cyan in the first row). Although the initial locations are preserved to some degree (e.g., the vertex at the upper part of the sphere remains around the upper part of the chair as shown with the magenta color in the discrete color map), the amount of displacement is not smooth along the vertices and thereby the final mesh has a irregular connectivity. This non-smooth deformation occurs even in the case #1 where the exact deformation to the target shape is available, which implies that more advanced structure for predicting mesh is required for a smooth deformation.
In summary, the deformation of the mesh is irregular even in the case when the target topology and the initial topology are the same. To solve this problem, we can add a regularization term that enforces the smoothness in the amount of displacement with respect to the vertex locations. The current laplacian regularization can enforce the smoothness of the final mesh but it cannot guarantee the smoothness of deformation. The smooth deformation will help reduce artifacts caused by the irregular connectivity (e.g., sharp edge of the chair).
For the case 2 where the initial mesh can never be deformed to the target shape without changing the connectivity because of the different topology, one possible approach would be estimating another representation that allows the topology change such as a implicit surface representation and then construct a mesh using the marching cube algorithm.