16-889 Assignment 2: Single View to 3D

I used three late days for this assignment.

Three

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

To run: python fit_data.py

Ground Truth Optimized Voxel Grid
KMeans KMeans

1.2. Fitting a point cloud (10 points)

To run: python fit_data.py --type 'point'

Ground Truth Optimized Point Cloud
KMeans KMeans

1.3. Fitting a mesh (5 points)

To run: python fit_data.py --type 'mesh'

Ground Truth Optimized Mesh
KMeans KMeans

2. Reconstructing 3D from single view

2.1. Image to voxel grid (15 points)

Train: python train_model.py --type 'vox'. Evaluate: python eval_model.py --type 'vox' --load_checkpoint

RGB Image Ground Truth Predicted
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans

2.2. Image to point cloud (15 points)

Train: python train_model.py --type 'point'. Evaluate: python eval_model.py --type 'point' --load_checkpoint

RGB Image Ground Truth Predicted Point Cloud
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans

2.3. Image to mesh (15 points)

Train: python train_model.py --type 'mesh'. Evaluate: python eval_model.py --type 'mesh' --load_checkpoint

RGB Image Ground Truth Predicted Mesh
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans

2.4. Quantitative comparisions(10 points)

For evaluating you can run: python eval_model.py --type voxel|mesh|point --load_checkpoint

Method Avg Test F1 Score
Voxels 77.453
Point Cloud 93.154
Mesh 93.553

Voxels performed the worst in terms of F1, and point cloud and mesh performed similarly. Point cloud and mesh sample points from their respective representations and are able to better express thinner structures compared to a fixed voxel size grid. With voxels, either the thin structure would be completely ignored or be predicted as much bigger resulting in more false positives.

2.5. Analyse effects of hyperparms variations (10 points)

I tried different values of w_smooth to investigate the effects on the predicted outputs.

As we increase the weight of the smoothing, the chamfer distance weight is effectively reduced. As such, the model will care less about the errors from the point cloud matching, and rather focus on making the mesh smooth. We can see this clearly below with the different smooth values.

Smoothness Example 1 Example 2 Example 3 Example 4 Example 5
0.1 KMeans KMeans KMeans KMeans KMeans
100.0 KMeans KMeans KMeans KMeans KMeans
1000.0 KMeans KMeans KMeans KMeans KMeans

In terms of quantitative performance, we can see the higher smoothness values performed worse and worse.

Smoothness Weight Avg Test F1 Score
0.1 93.553
100.0 91.254
1000.0 88.124

Overall, visually 100.0 smoothness looks the best as there aren't as many pointy edges compared to 0.1, while still capturing more details than 1000.0. In effect, this hyperparameter seems to act like a regularizer.

2.6. Interpret your model (15 points)

We can record the outputs of each layer in the decoder and cluster the vectors across the entire test batch using L2 norm. Running k-means on the second to last layer in the decoder with 10 clusters yielded the following:

Cluster Example 1 Example 2 Example 3 Example 4 Example 5
1 KMeans KMeans KMeans KMeans KMeans
2 KMeans KMeans KMeans KMeans KMeans
3 KMeans KMeans KMeans KMeans KMeans
4 KMeans KMeans KMeans KMeans KMeans

Each of the chairs across the rows belong to the same cluster and are determined as similar to each other. Some clusters are more interpretable than others. For example, cluster 1 tends to prefer similar sized chairs all with similar orientations. Cluster 4 seems to cluster chairs with legs that go farther away from each other. While these are not perfect "visible" features, our model believes these chairs to be similar feature-wise.

3. (Extra Credit) Exploring some recent architectures.

3.1 Implicit network (10 points)

Implement a implicit decoder that takes in as input 3D locations and outputs the occupancy value. Some papers for inspiration [1,2]

3.2 Parametric network (10 points)

Train: python train_model.py --type 'parametric'. Evaluate: python eval_model.py --type 'parametric' --load_checkpoint

I sample points on a plane and sum over 3 different MLP which then predicts the 3d point outputs. The average test F1 score was 0.906. Note this model was only trained for 1000 steps, unlike previous models which were trained for 10000 steps. Performance could potentially improve by expanding the MLPs more, however, the complexity and the size of the model would also grow tremendously.

RGB Image Ground Truth Predicted Point Cloud
KMeans KMeans KMeans
KMeans KMeans KMeans
KMeans KMeans KMeans