Q1.1

predicted voxel ground truth voxel
Alt Text Alt Text

Q1.2

predicted mesh ground truth mesh
Alt Text Alt Text

Q1.3

predicted point ground truth mesh
Alt Text Alt Text

Q2.1

predicted voxel ground truth voxel image
Alt Text Alt Text
predicted voxel ground truth voxel image
Alt Text Alt Text
predicted voxel ground truth voxel image
Alt Text Alt Text

Q2.2

predicted point ground truth point image
Alt Text Alt Text
predicted point ground truth point image
Alt Text Alt Text
predicted point ground truth point image
Alt Text Alt Text

Q2.3

predicted point ground truth point image
Alt Text Alt Text
predicted point ground truth point image
Alt Text Alt Text
predicted point ground truth point image
Alt Text Alt Text

Q2.4

model vox point mesh
F1 score 88.18633 87.98031 88.85673

Q2.5

I evaluate the change on n_points. Ideally increasing the number of n_points can contribute to a more accurate 3D model, but this may not always be true. When we increase the n_points the output channels of the last fully connected layer will also increase. When the n_points is higher than a threshold. It may have 2 issues, First, more computation is needed to train the network. Second, increasing the number of sampling points may not always provide more supervision on the model.

num of points 2048 4096 6144
F1 score 85.09808 87.98031 87.043495

Q2.6

One things that is interesting to see, is the process of how the model learn gradually during the training process. So I plot the visualization result of the voxel net during the training.

without training 200 iterations 400 iterations 2000 iterations
Alt Text Alt Text Alt Text Alt Text

Without training the model, the point cloud looks is noise don't have any useful information. After 200 iterations, the model start to learn the structure of the chair. With 400 iterations, the model learn some detailed information like how the foot and armrest of the chair should be. After 2000 iterations, the model removes some outliers and the all voxels estimated are in the range of the object.

encoded feature

Another interesting things I wanna explore is the meaning of the feature channels in the encoded feature after resnet. I compare the encoded feature from the chair and sofa as below.

Alt Text

sofa chair chair
Alt Text Alt Text