predicted voxel | ground truth voxel |
---|---|
![]() |
![]() |
predicted mesh | ground truth mesh |
---|---|
![]() |
![]() |
predicted point | ground truth mesh |
---|---|
![]() |
![]() |
predicted voxel | ground truth voxel | image |
---|---|---|
![]() |
![]() |
![]() |
predicted voxel | ground truth voxel | image |
---|---|---|
![]() |
![]() |
![]() |
predicted voxel | ground truth voxel | image |
---|---|---|
![]() |
![]() |
![]() |
predicted point | ground truth point | image |
---|---|---|
![]() |
![]() |
![]() |
predicted point | ground truth point | image |
---|---|---|
![]() |
![]() |
![]() |
predicted point | ground truth point | image |
---|---|---|
![]() |
![]() |
![]() |
predicted point | ground truth point | image |
---|---|---|
![]() |
![]() |
![]() |
predicted point | ground truth point | image |
---|---|---|
![]() |
![]() |
![]() |
predicted point | ground truth point | image |
---|---|---|
![]() |
![]() |
![]() |
model | vox | point | mesh |
---|---|---|---|
F1 score | 88.18633 | 87.98031 | 88.85673 |
I evaluate the change on n_points
. Ideally increasing the number of n_points
can contribute to a more accurate 3D model, but this may not always be true. When we increase the n_points
the output channels of the last fully connected layer will also increase. When the n_points
is higher than a threshold. It may have 2 issues, First, more computation is needed to train the network. Second, increasing the number of sampling points may not always provide more supervision on the model.
num of points | 2048 | 4096 | 6144 |
---|---|---|---|
F1 score | 85.09808 | 87.98031 | 87.043495 |
One things that is interesting to see, is the process of how the model learn gradually during the training process. So I plot the visualization result of the voxel net during the training.
without training | 200 iterations | 400 iterations | 2000 iterations |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Without training the model, the point cloud looks is noise don't have any useful information. After 200 iterations, the model start to learn the structure of the chair. With 400 iterations, the model learn some detailed information like how the foot and armrest of the chair should be. After 2000 iterations, the model removes some outliers and the all voxels estimated are in the range of the object.
Another interesting things I wanna explore is the meaning of the feature channels in the encoded feature after resnet. I compare the encoded feature from the chair and sofa as below.
sofa chair | chair | |
---|---|---|
![]() |
![]() |