GT | Fitted Voxel |
---|---|
![]() |
![]() |
GT | Fitted Point Cloud |
---|---|
![]() |
![]() |
GT | Fitted Mesh |
---|---|
![]() |
![]() |
Training params: --type vox --num_workers 8 --batch_size 32 --max_iter 3000 --save_freq 500
RGB Image | Voxel GT | Voxel Predicted |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Training params: --type point --num_workers 8 --batch_size 32 --max_iter 3000 --save_freq 500
RGB Image | Point Cloud GT | Point Cloud Predicted |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Training params: --type mesh --num_workers 8 --batch_size 32 --max_iter 5000 --save_freq 500 --w_smooth 0.01
RGB Image | Mesh GT | Mesh Predicted |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Voxel | Point Cloud | Mesh | |
---|---|---|---|
F1-Score@0.05 | 81.52 | 95.874 | 93.23 |
We observe that the F1 for voxel grid is much lower than point cloud and mesh. The voxel representation had a hard time predicting chair legs.
Intuitively, this makes sense since predicting a voxel grid requires the network to do more work. It needs to give out predictions for a 32x32x32(~30K) grid, whereas point cloud and mesh only need predict about 1.5K - 2K values. This would mean that we need to add some regularization or prior to the voxel network to allow it to learn with the finite amount of data. In addition to that, even at this resolution, the grid is very coarse to predict high fidelity features in the input image. Third, voxel-to-voxel correspondence impose hard alignment between source and ground truth which may not be desirable.
The point cloud has the highest F1-score as intuitively it is not constrained with the topology of the mesh initialization, and can learn arbitary shapes.
In this section, we evaluate the effect of the strength of smoothness parameter on the generated mesh representations. These set of experiments are run for a total of 1K iterations only.
Hypothesis: Increasing the strength of smoothness should smoothen the generated mesh. The resulting F1-score might increase or decrease.
w_smooth |
Prediction | F1-Score@0.05 |
---|---|---|
0.01 | ![]() |
86.95 |
0.1 | ![]() |
85.48 |
1 | ![]() |
82.12 |
10 | ![]() |
77.14 |
We observe that the predicted mesh becomes smoother with the increase in w_smooth
parameter as expected. Interestingly, this is at the expense of F1-score which degrades with the increase in smoothness parameter.
On observing more meshes, it can be seen that meshes generated from models with higher w_smooth
are not diverse and just learn a general shape of the object. On the other hand, lower w_smooth
results in very pointy meshes.
We are trying to obtain a 3D model of the object given only a 2D view of the object. It would be interesting to see how does the predictions change with change in viewpoints of the object. We taking this to the extreme where the object is flipped horizontally leading to interesting predictions.
The model is still able to predict pose-invariant chairs. But, we can observe extra undesirable points on the lower section of the chair. Also, the model isn't able to estimate the width of the chair correctly.
RGB Image View 1 | Prediction View 1 | RGB Image Flipped Horizontally | Prediction View 2 |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
It is also a good idea to superimpose the prediction on top of GT to get a fine-grained comparison. From these we can observe that even though the predicted shape matches with the ground truth, it is off by a certain distance at some places. Interestingly, the predicted model cannot model the tilt in the back of chair and sometimes the height is off.
Red = GT, White = Prediction
RGB Image | Superimposed |
---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |