Jinkun Cao (jinkunc@andrew.cmu.edu)
I implement the voxel fitting loss with BCELoss by the interface provided by Pytorch. From the left to right, they are the ground truth mesh and the mesh from the voxel before/after fitting.
I use the knn interface provided by Pytorch3D to implement the chamfer loss. From left to right, the visualizations are the ground truth, point cloud before fitting and the point cloud after fitting.
I add the laplacian smoothing loss into the mesh fitting. From left to right, the visualziations are the ground truth mesh, the mesh sphere before fitting and the mesh after fitting.
I implement the image-to-voxel network by a stack of TransposedConv3D and ReLU activation layers. The network architecture is adopted from the one introduced during lecture. I select some samples randomly from the test set an show them as below. The default configuration is 32x32x32 density voxel grid. From left to right, the visualizations are for the RGB image input, the ground truth mesh, and the voxel grid output. Because the voxels are sparse, for the prediction visualization, I make the camera closer (distance=3->2) to enable more clear details.
voxel #0
voxel #100
voxel #200
voxel #300
For this task, I implement a simple stack of linear layers. Here I include the visualizations of some samples randomly selected form the test set below.
point cloud #0
point cloud #100
point cloud #200
point cloud #300
Similar to the network for generating point cloud, I implement a stack of linear layers for generating mesh. As previous, I visualize the results of some samples from the test set below.
mesh #0
mesh #100
mesh #200
mesh #300
With the given evaluation metric F1@0.05, I test the implemented model and their trained checkpoints at the 10000 steps. The average scores are shown as below. We could see it is obviously that the performance for the voxel recovery model is the worst. This quantitative observation is aligned with our visualization results.
To analyses the reasons, I think the lack of chamfer loss and the smoothing loss might be one important reason as F1 score measure the distance between the predicted points and the ground truth. Moreover, the output shape of voxelgrid is small. The sparse voxels (32x32x32) may be another reason.
Voxel | Point Cloud | Mesh | |
---|---|---|---|
Avg Test F1@0.05 score | 51.407 | 89.522 | 88.349 |
A remaining key issue about my implementation is the weak performance of the voxel generation model. I tried to analyses the issues. One recognized problem is that, in the ground truth, most positions are empty and only 5% of the grid has voxel occupied. So, one potential way to improve the performance is to make a higher weight to these position during loss calculation. By default, I set the weight for non-occupied position 1 and occupied positions 10. Then, I adjust the weight of occupied position and make both the qualitative and qualitative analysis.
Weight | 1.0 | 10.0 | 100.0 |
---|---|---|---|
Avg. test F1@0.05 | 39.083 | 51.407 | 42.833 |
Suprisingly, I found that both too large and too small weights will decrease the performance. The former one encourages more false positives and the latter causes more false negatives during voxel prediction.
I make some visualizations over them, and find that when weight = 1.0. Sometimes the cubify fails because of the low confidence of predicted points. So I show the results for w=10.0 and w=100.0 here. From the left to right, the visualizations are the ground truth mesh and the output for w=10.0 and w=100.0 models.
It is very obvious that when the weighting factor is too large for positive voxel position during training, the model will output many false positives.
The threshold of cubifying voxels is critical. I'd like to study the "hard positions" where the output confidence of voxel is just marginally lower or higher than the threshold. As the default threshold is 0.5, I select to visualize the output positions whose confidence is between 0.4-0.5. It would be an important signal if we'd like to improve the model performance because usually the variance comes first to these marginal voxels. From left to right, the visualizations are the ground truth mesh, the overall output visualization with threshold=0.5 and the marginal voxels.
Through the visualization, we could see that the marginal cases usually at the boundary of the object. They are very sensitive. If we'd like to improve the model performance with more intuitive, the study of variance of these voxels might be helpful.