16-889 Assignment 2: Single View to 3D

Goals: In this assignment, you will explore the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.

0. Setup

Please download and extract the dataset from here. After unzipping, set the appropiate path references in dataset_location.py file here

Make sure you have installed the packages mentioned in requirements.txt. This assignment will need the GPU version of pytorch.

1. Exploring loss functions

This section will involve defining a loss function, for fitting voxels, point clouds and meshes.

1.1. Fitting a voxel grid (5 points)

Left: GT, Right: Prediction

Cow render Cow render

1.2. Fitting a point cloud (10 points)

Cow render Cow render

1.3. Fitting a mesh (5 points)

Cow render Cow render

2. Reconstructing 3D from single view

This section will involve training a single view to 3D pipeline for voxels, point clouds and meshes. Refer to the save_freq argument in train_model.py to save the model checkpoint quicker/slower.

2.1. Image to voxel grid (15 points)

Cow render Cow render Cow render

Cow render Cow render Cow render

Cow render Cow render Cow render

2.2. Image to point cloud (15 points)

Cow render Cow render Cow render

Cow render Cow render Cow render

Cow render Cow render Cow render

2.3. Image to mesh (15 points)

Cow render Cow render Cow render

Cow render Cow render Cow render

Cow render Cow render Cow render

2.4. Quantitative comparisions(10 points)

The average F1 for voxelgrids, pointcloud, and mesh are 56.851, 96.527, 90.071

Voxelgrids perform the worst because the gt voxel is sparse and it is not easy to find a good loss function to train the model. Also, the marching cube method would also introduce some errors.

Point cloud and mesh can achieve similar results because the dataset only have chairs and direct supervision signal are provided.

2.5. Analyse effects of hyperparms variations (10 points)

Analyse the results, by varying an hyperparameter of your choice. For example n_points or vox_size or w_chamfer or initial mesh(ico_sphere) etc. Try to be unique and conclusive in your analysis.

I change the w_chamfer from 1.0 to 10.0 and run the mesh representation. The average F1 score improve from 90.071 to 90.926. Below shows the qualitative results. From left to right are input image, results from Q2.3, results from 2.3.

We can observe that although the larger w_chamfer can give a better global reconstruction, it performs worse in terms of smoothness (locally). This also illustrates that average F1 score is not always reliable.

Cow render Cow render Cow render

2.6. Interpret your model (15 points)

I input a very simple sphere, as shown in the left figure. The output is shown in the right. We can observe that the model only generate chair point cloud because it only saw the chairs during training. This means that the model is only doing the nearest neighbour searching for prediction and lack of the generalization ability.

Cow render Cow render