16-889 Assignment 2: Single View to 3D¶
Note: Used 5 late days for this assignment.
Goals: This assignment explores the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.
1. Exploring loss functions¶
This section will involve defining a loss function, for fitting voxels, point clouds and meshes.
1.1. Fitting a voxel grid (5 points)¶
Run python main.py fit_data --type 'vox'
, to fit the source point cloud to the target point cloud.
Voxel Loss | |
---|---|
Result Voxel Grid | GT Voxel Grid |
---|---|
![]() |
![]() |
1.2. Fitting a point cloud (10 points)¶
Run python main.py fit_data --type 'point'
, to fit the source point cloud to the target point cloud.
Chamfer Loss | |
---|---|
Result Pointcloud | GT Pointcloud |
---|---|
![]() |
![]() |
1.3. Fitting a mesh (5 points)¶
Run python main.py fit_data --type 'mesh'
, to fit the source mesh to the target mesh.
Chamfer Loss | |
---|---|
Result Mesh | GT Mesh |
---|---|
![]() |
![]() |
2. Reconstructing 3D from single view¶
This section involves training a single-view-to-3D pipeline for voxels, point clouds and meshes.
2.1. Image to voxel grid (15 points)¶
# to train
python3 main.py train_model --type 'vox'
# to eval
python3 main.py eval_model --type 'vox' --load_checkpoint
# | Input Image | GT Mesh | Predicted Voxel Grid |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
2.2. Image to point cloud (15 points)eset¶
# to train
python3 main.py train_model --type 'point'
# to eval
python3 main.py eval_model --type 'point' --load_checkpoint
# | Input Image | GT Mesh | Predicted Pointclouds |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
2.3. Image to mesh (15 points)¶
# to train
python3 main.py train_model --type 'mesh'
# to eval
python3 main.py eval_model --type 'mesh' --load_checkpoint
# | Input Image | GT Mesh | Predicted Mesh |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
2.4. Quantitative comparisions(10 points)¶
# | Type | F1 Score |
---|---|---|
1 | Voxel Grid | 81.348 |
2 | Pointcloud | 96.654 |
3 | Mesh | 85.459 |
F1 score is computed based on pointclouds. For voxels and mesh, the point are samples from the predictions. We can see that Pointcloud has the highest F1 score because it is predicted directly on the points without any other conversion.
Meshes have a constraint based on the connectivity of the vertices and faces from the initial structures used. In this case we use a sphere, which is a water tight structures. It'll not be possible to deform this into a chair which has holes in them. As a result, the F1 score is less.
F1 score for voxels is the least because the voxels have to first be converted to a mesh and then the points have to be samples from this mesh. However, if the voxel grid itself has errors or is at a much lower resolution, then it'll result in the much lesser F1 score. Here, we're using a resolution of 32x32x32 which is too less to capture some of the thin structres in the chair shapes, and hence results in the lowest F1 scores.
2.5. Analyse effects of hyperparms variations (10 points)¶
I played around with --w_chamfer
and --w_smooth
hyperparameters and the
results are as seen below. We can see the mesh structures has sharp edges and
faces with low smoothness value. This is expected because, by increasing the
smoothness value, the loss is penalized more to ensure all the vertices are as
much co-planar as possible.
# to train
python3 main.py train_model --type 'mesh' --w_smooth 0.4
python3 main.py train_model --type 'mesh' --w_smooth 1
python3 main.py train_model --type 'mesh' --w_smooth 1.5
--w_smooth=0.4 |
--w_smooth=1 |
--w_smooth=1.5 |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Another experiment I tried was to use a mesh model of a chair as an initial object instead of the sphere mesh. My hypothesis was that since a sphere mesh inherently cannot represent chair models with holes, using a generic chair model should make the network learn better to represent such complex chair models. However, my model didn't learn any better. The loss saturated after a few epochs and the generated outputs weren't any better. The results are as seen below.
Sample result 1 | Sample result 2 | Sample result 3 |
---|---|---|
![]() |
![]() |
![]() |
2.6. Interpret your model (15 points)¶
To understand my model, I wanted to check if the model is learning the features of the chairs, i.e, some chairs have thin legs, some chairs have holes, some are flat and big, etc. If the model is learning these features correctly, it should be able to identify similar kind of objects. Therefore for this purpose, I experimented by checking if the model is able to query a given type of chair and return similar ones. The results are shown below. I've used 3 types of chairs, and the corresponding results are shown besides this.
To perform this experiment, I chose the pointcloud model, and extracted the features from the second last layers. All these layers are indexed and a KNN algorithm is run to identify the models closest to the given query model.
From these results, it is seen that my model is able to learn individual features of the chair.
Query Object | Result 1 | Result 2 | Result 3 |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
3. (Extra Credit) Exploring some recent architectures.¶
3.1 Implicit network (10 points)¶
Implement a implicit decoder that takes in as input 3D locations and outputs the occupancy value. Some papers for inspiration [1,2]
3.2 Parametric network (10 points)¶
Implement a parametric function that takes in as input sampled 2D points and outputs their respective 3D point. Some papers for inspiration [1,2]