Voxel F1@0.05: 73.42
PointCloud F1@0.05: 89.56
Mesh F1@0.05: 84.81
The F1 scores are highest for pointclouds, then for mesh and lowest for voxels. For voxels, we need to predict occupancy for each voxel and due to low resolution it won't be able to represent all the details in the given scene. Pointclouds only need to directly predict surface points and can represent more details than a low resolution voxel grid. Also it doesn't require learning the connectivity information hence they are slightly better than Mesh.
100: 57.00
1000: 86.25
3000: 89.32
5000: 89.56
7000: 90.45
9000: 90.38
We see that by varying the number of points in predicting point cloud, the F1 score increases with increasing points. This makes sense because now it can represent a shape in a more fine-grained manner. However, we also see a diminishing return with increasing the number of points. Generally >1000 points is almost sufficient
To see how the deformation of a mesh is happening as the training progresses, we created a visualisation tool which runs the inference on a given RGB images at a time interval of say 100 iterations and combines all the obtained mesh renders into a gif as shown below. On top left corner, we have the iteration number. This kind of visualisation helps in seeing the failure cases more closely as the training progresses like shown below. We can infer that till a few iterations it was improving the initial mesh initialisation but it cannot really fit to the original structure. This might indicate a need to increase memory capacity or start with a better initialisation. Further we can see that after some 6000 iterations, the mesh starts to become worser which might indicate overfitting to train data and thus failure to generalise to this test set mesh.
Left: Input image, Middle: Ground truth Mesh, Right: Visualisation