The final average F1 score at 0.05 for Voxel is 72.99
The final average F1 score at 0.05 for Point Cloud is 92.88
The final average F1 score at 0.05 for Mesh is 81.14
For the voxel representation, the $32^3$ grid size is coarse and is unable to model finer artifacts like chair legs or a slatted back. This leads to a lower performance. Meshes and Point Clouds are better suited for our purpose but meshes can be noisy (and self-interset) when some predicted points for certain faces are outliers. This causes some learned faces to be erroneous in shape and leads to lower F1 score. Point Clouds do not suffer due to noise as they are discontinuous and therefore, give the highest F1 scores.
Mesh (with other parameters set at default values) | F1 Score |
---|---|
With Chamfer Loss Weight = 0.1 | 80.689 |
With Chamfer Loss Weight = 1 (Default) | 81.14 |
With Chamfer Loss Weight = 5 | 86.205 |
With ico_sphere = 2 | 79.441 |
With ico_sphere = 4 (Default) | 81.14 |
With ico_sphere = 5 | 86.613 |
Point Cloud (with other parameters set at default values) | F1 Score |
n_points = 1000 | 81.138 |
n_points = 5000 (Default) | 92.88 |
n_points = 10000 | 91.207 |
For the meshes, increasing both the weight for the Chamfer loss and the ico sphere value gave a higher F1 score. Increasing the ico sphere value gives us more faces in the mesh and leads to a better learned 3D prediction. Increasing the weight of the Chamfer Loss ensures learning a tighter fit to the ground truth.
For point clouds, a lower num_points (1000) produced a lower F1 score. This is expected due to the sparsity. However, higher num_points (10000), though expected to give a F1 higher score, produced a lower F1 score than the deafult value (5000). This is perhaps due to greater noisy predictions (outliers)
To test whether the model truly understood the structures in the 2D images, I tried testing it on images of tables from the internet. Below are the results I obtained:
The results show that the model learns some inherent characteristics like height of the object, but is generally still predicting chairs as outputs. In the first example, the output looks a little flatter than the usual outputs for chairs. It may be learning to flatten the top a little if the input is a table. But more generally, the model doesn't generalize to other objects, even though they are similar.
I have also tested a few out-of-distribution images of chairs taken from the internet and the results are below:
The model does understand the general outline of a chair, but again, doesn't generalize well to out-of-distribution data.