16889-PROJECT2

Zheren_Zhu

1. Exploring loss functions

This section will involve defining a loss function, for fitting voxels, point clouds and meshes.

1.1. Fitting a voxel grid

In this subsection, we will define binary cross entropy loss that can help us fit a 3D binary voxel grid. Define the loss functions here in losses.py file.

vox_tgt:

vox_src:

1.2. Fitting a point cloud

In this subsection, we will define chamfer loss that can help us fit a 3D point cloud . Define the loss functions here in losses.py file.

pointcloud_tgt:

pointcloud_src:

1.3. Fitting a mesh

In this subsection, we will define an additional smoothening loss that can help us fit a mesh. Define the loss functions here in losses.py file.

mesh_tgt:

mesh_src:

2. Reconstructing 3D from single view

This section will involve training a single view to 3D pipeline for voxels, point clouds and meshes.

2.1. Image to voxel grid

Decoder NN structrue:

nn.ConvTranspose3d(64, 64, kernel_size=2, dilation=6),

nn.ConvTranspose3d(64, 128, kernel_size=2, dilation=6),

nn.ConvTranspose3d(128, 128, kernel_size=4, dilation=3),

nn.ConvTranspose3d(128, 128, kernel_size=4),

nn.ConvTranspose3d(128, 64, kernel_size=4),

nn.ConvTranspose3d(64, 1, kernel_size=4),

There are six 3D transposed convolutional layers. Each transposed convolutional layer is followed by a ReLU activation except for the last layer followed by a sigmoid activation layer.

From left to right: input RGB image, ground truth, prediction

2.2. Image to point cloud

Decoder NN structrue:

nn.Linear(512, 2048),

nn.Linear(2048, 4096),

nn.Linear(4096, 4096),

nn.Linear(4096, 3000),

There are four Linear layers. Each transposed convolutional layer is followed by a batch normalization layer, a ReLU activation, and a dropout layer except for the last layer. The batch size is 32.

From left to right: input RGB image, ground truth, prediction

2.3. Image to mesh

Decoder NN structrue:

nn.Linear(512, 2048),

nn.Linear(2048, 4096),

nn.Linear(4096, 4096),

nn.Linear(4096, 3000),

There are four Linear layers. Each transposed convolutional layer is followed by a batch normalization layer, a ReLU activation, and a dropout layer except for the last layer. The batch size is 32.

From left to right: input RGB image, ground truth, prediction

2.4. Quantitative comparisions

Quantitatively compare the F1 score of 3D reconstruction for meshes vs pointcloud vs voxelgrids. Provide an intutive explaination justifying the comparision.

Voxel_F-score@0.05: 55.068

Pointcloud_F-score@0.05: 78.213

Mesh_F-score@0.05: 90.732

Explaination:

For voxel, we used binary cross entropy loss to optimize, and we need to predict the occupancy fo each voxel grid. Also, the resolution for the voxel prediction is 32x32x32, which is low for evaluation sampling.

For pointclouds, we only have to learn their spatial locations. So, a simple network can get a better result. Also, the chamfer loss used in optimization can exaxctly lead to a higher F1 score for it's objective.

For meshes, we similarly used a chamfer loss to deform the mesh from a sphere.

2.5. Analyse effects of hyperparms variations

w_smooth: A higher w_smooth value can lead to a smoother and more continuous output, but it will also lead to a lower F1 score. The ideal w_smooth value should below 10

2.6. Interpret your model

The training process of a pointcloud