Late days used: 5

1.1 Fitting a voxel grid

1.2 Fitting a point cloud

1.3 Fitting a mesh

2.1. Image to voxel grid

2.2. Image to point cloud

2.3. Image to mesh

2.4 Quantitative comparisions

average test F1 score @ 0.05:
voxel: 89.468
point: 96.264
mesh: 79.046

2.5 Analyse effects of hyperparms variations

I explored the decoder archiecture a bit for point cloud and mesh. The first one is a conv3d architecture that is very similar to the one proposed in Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images, Fig 2.
The second one is a vanilla MLP that maps the encoder latent vector to the output point cloud / mesh.
Their corresponding test F1 score:
point with conv3d: 96.264
point with mlp: 89.733
mesh with conv3d: 81.844
mesh with mlp: 79.046
It seems that using conv3d brings huge benefit to point cloud prediction, and a smaller benefit for mesh prediction.

Pointcloud prediction with conv3d:

Pointcloud prediction with mlp:

Mesh prediction with conv3d:

Mesh prediction with mlp: