I fitted the voxel grid with BCEloss (with Logits) and use marching cube to convert it to mesh before rendering with a PointLight at (0,0,0). The resulting mesh and the target mesh look the same.
I fit the point cloud using chamfer loss via using function knn_points. The visualized point clouds look the same:
I added the mesh_laplacian_smoothing smoothing loss. The resulting mesh and target mesh look the same:
I used the network consisting of transposed conv3D layer connected by ReLU() as demonstrated in course slide to fit the voxels.
Here I show visuals of three examples in the test set. For each example show the input RGB, render of the predicted 3D voxel grid and a render of the ground truth mesh.
Voxel 0 |
GT Mesh 0 |
GT Rendering 0 |
![]() |
![]() |
![]() |
Voxel 100 |
GT Mesh 100 |
GT Rendering 100 |
![]() |
![]() |
![]() |
Voxel 200 |
GT Mesh 200 |
GT Rendering 200 |
![]() |
![]() |
![]() |
The docoder I defined is simply a linear layer.
Here I include visuals of three examples in the test set. For each example I show the input RGB, render of the predicted 3D point cloud and a render of the ground truth mesh.
PC 0 |
GT Mesh 0 |
GT Rendering 0 |
![]() |
![]() |
![]() |
PC 100 |
GT Mesh 100 |
GT Rendering 100 |
![]() |
![]() |
![]() |
PC 200 |
GT Mesh 200 |
GT Rendering 200 |
![]() |
![]() |
![]() |
I used a 4 layer MLP connected by ReLU and finally a Tanh layer.
I include visuals of three examples in the test set. For each example I show the input RGB, render of the predicted mesh and a render of the ground truth mesh.
Mesh 0 |
GT Mesh 0 |
GT Rendering 0 |
![]() |
![]() |
![]() |
Mesh 100 |
GT Mesh 100 |
GT Rendering 100 |
![]() |
![]() |
![]() |
Mesh 200 |
GT Mesh 200 |
GT Rendering 200 |
![]() |
![]() |
![]() |
Quantitatively compare the F1 score of 3D reconstruction for meshes vs pointcloud vs voxelgrids. Provide an intutive explaination justifying the comparision.
Here I include the average test F1 score at 0.05 threshold for voxelgrid, pointcloud and the mesh network.
It can be seen that fitting point cloud gives the most accurate results in terms of F1 and fitting voxels are the worst. This result makes sense because for point cloud fitting, we directly use chamfer loss to optimize for the reconstructed point locations, and F1 is exactly measuring whether predicted points and ground truth are close. For mesh fitting, it performs reasonably well because we also use chamfer loss on sampled points. For voxels however, we do not optimize for sampled point location. Another reason might because the voxel representation is too small (32x32x32) to accurate capture the precise location.
Average Test F1@0.05 |
|
Voxel Fitting |
51.014 |
Point Cloud Fitting |
90.299 |
Mesh Fitting |
83.621 |
Analyse the results, by varying an hyperparameter of your choice. For example n_points or vox_size or w_chamfer or initial mesh(ico_sphere) etc. Try to be unique and conclusive in your analysis.
I tried to change the initial mesh shape while fitting the mesh. Except for ico_sphere(4), I also tried initializing from (1) a torus, (2) a cow from assignment 1, (3) a dolphin downloaded from pytorch3d library.
GT Mesh 0 |
Mesh from sphere 0 |
Mesh from torus 0 |
Mesh from cow 0 |
Mesh from dolphin 0 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
I also calculate the average F1 score. It can be seen that torus as initial shape achieves the best result, presumably because it is closer to chairs that sometimes have holes:
Average Test F1@0.05 |
|
Mesh from sphere |
83.621 |
Mesh from torus |
86.503 |
Mesh from dolphin |
85.418 |
Mesh from cow |
83.221 |
Simply seeing final predictions and numerical evaluations is not always insightful. Can you create some visualizations that help highlight what your learned model does? Be creative and think of what visualizations would help you gain insights. There is no `right' answer - although reading some papers to get inspiration might give you ideas.
I tried to plot the voxels (with probability greater than 0.5) with their probability values in JET colorspace. It seems the voxels are fitted properly. Especially, the back and legs of the chairs have very high probability values (red means high, green means low). However, we can see that the periphery of the chairs/sofa have generally lower probability values. It suggests most higher value voxels are centered towards the object centroid, which may results from low capacity decoder that underfits to various types of chairs.
Probability Voxel 0 |
Probability Voxel 100 |
Probability Voxel 500 |
![]() |
![]() |
![]() |