16-889 Assignment 2
Yutian Lei (Andrew ID: yutianle)
1. Exploring loss functions
1.1 Fitting a voxel grid (5 points)
1.2. Fitting a point cloud (10 points)
1.3. Fitting a mesh (5 points)
2. Reconstructing 3D from a Single View
In sections 2.1, 2.2, and 2.3, the visualization of 3D reconstruction by meshes, point cloud, and voxel of three examples in the test datasets are shown respectively. The first two examples show the successful cases while the last one shows the failure case.
2.1. Image to voxel grid (15 points)
2.2. Image to point cloud (15 points)
2.3. Image to mesh (15 points)
2.4. Quantitative Comparisons (10 points)
Voxel | Point Cloud | Mesh | |
---|---|---|---|
F1@0.05 | 86.201 | 94.184 | 84.710 |
2.5. Analyse effects of hyperparms variations (10 points)
2.5.1 Voxel: Whether to Normalize the Loss Value according to Occupancy Rate
For the image to voxel model, the empty value (or voxel) generally occupies the most of areas in an object, making the reconstruction model tends to predict more “0” as a result. Also, the occupancy rates for each object are different. Thus, it will be helpful to normalize the loss value according to the occupancy rate in the training process. Here, the F1@0.05 scores for the image to voxel model with and without loss value normalizing are presented. The performance of the image to voxel model improves with loss normalization. Also, the loss normalization alleviates the discontinuity problem of the model to some extent. For example, the predicted chair legs without normalization in the figure below are very thin, while with normalization, the former two chair legs become normal.
Without Normalization | With Normalization | |
---|---|---|
F1@0.05 | 86.201 | 87.668 |
2.5.2 Point Cloud: Number of Points
For the image to point cloud model, the number of points used to present to the 3D model will affect the performance of the model. Here, the F1@0.05 scores for the image to point cloud model with the different numbers of points used are presented. The experiments demonstrate that using too many points encumbers the model’s performance due to the capacity limit of the model, while using too few points may be insufficient to express the 3D structure well.
Number of Points | 2500 | 5000 | 7500 |
---|---|---|---|
F1@0.05 | 92.908 | 94.184 | 91.202 |
2.5.3 Mesh: Mesh Initialization
For the image to mesh model, the important factor to affect the performance of the model will be the mesh initialization of the model. Here, the F1@0.05 scores for the image to mesh model with initializing as a predefined unit ico-sphere (level 4) and a “standard” chair picked from the r2n2 dataset are presented. It’s noted that although the F1@0.05 score with ico-sphere initialization is just slightly lower than that of chair initialization, the visualization of the chair initialization is significantly better as shown below.
Mesh Initialization | Unit Ico-Sphere (Level 4) | Chair |
---|---|---|
F1@0.05 | 82.645 | 84.710 |
2.6. Interpret your model (15 Points)
As the classic encoder-decoder structure is used for all three models in this assignment, I believe the interpolation will be a good method to interpret the latent space of the model and explore the smoothness of the learned model. Specifically, the encoded features of two selected images are linearly interpolated with a step 0.1, which are then encoded by the corresponding model to generate interpolated outputs. As the figures shown below, all three learned models can generate a smooth transfer from a chair structure (1) to a sofa structure (2), which demonstrate that the models are not only memorizing the correspondence between input images and 3D structure but rather building a robust understanding from the latent encoded features to 3D structure space.
2.6.1 Voxel
2.6.2 Point Cloud
2.6.3 Mesh
3. (Extra Credit) Exploring some recent architectures
3.1 Implicit network (10 points)
For the implicit network, I reimplement it based on the Occupancy Networks: Learning 3D Reconstruction in Function Space with simplifications.
Specifically,
- In the training stage, the training points are drawn from the discrete voxel space rather than sampling some points from a continous 3D space and determining whether these points are inside or outside the correspoing meshes.
- In the inference stage, the output mesh is predicted from simple marching cube algorithm similar to 2.1 rather than the Multiresolution IsoSurface Extraction (MISE) in the orignal paper. Though, I also borrowed the MISE codes from the official implementation to conduct an abilition study on two mesh generation methods. The experiments will be shown below.
- For the model part, I basiclly reimplment the baseline model in the original paper except for: 1) The five ResNet blocks with conditional batch normalization are implmented without the “residual” or “skip connect” structure; 2) The latent encoder is deleted due to its complexity.
The visualization of result of this naive implementation of implicit network is shown below.
The quantitative results of using naive and MISE mesh generation methods are shown below.
MISE | Naive | |
---|---|---|
F1@0.05 | 84.115 | 83.073 |
3.2 Parametric network (10 points)
For the parametic network, I reimplement it based on the AtlasNet: A Papier-Mach ˆ e Approach to Learning 3D Surface Generation with simplifications.
Specifically,
- In the training stage, the training points is sampled randomly instead of using regular sampling techinique metioned in the original paper.
- In the inference stage, only the naive mesh generation method is implemented, that is, transfer the unit square to 3D with keeping its connectivity. The Poisson surface reconstruction (PSR) is not implmeneted due to its complexity.
- For the model part, I basiclly reimplment the baseline model in the original paper except for the model only surpports using one template (or say primitive surface) instead of multiple templates in the official implmentation.
The visualization of result of this naive implementation of implicit network is shown below.
The quantitative results of sampling 2048 points and 4096 points in training are shown below.
Number of Points | 2048 | 4096 |
---|---|---|
F1@0.05 | 63.408 | 70.987 |