Ziyun Xu(ziyunx)
late day used: 4
I published my website several minutes past 12pm because the AWS connection was down for about 20minutes before I was abouot to submit... Very sorry for that!
For detailed visualization code, refer to the render.ipynb
src
tgt
src
tgt
src
tgt
Voxel: Align with the structure in the paper Learning a Predictable and Generative Vector Representation for Objects
self.linear_layer = torch.nn.Linear(512, 216)
self.decoder = torch.nn.Sequential(
torch.nn.ConvTranspose3d(1, 256, kernel_size=3, stride=1),
torch.nn.ReLU(),
torch.nn.ConvTranspose3d(256, 384, kernel_size=3, stride=3, padding=1),
torch.nn.ReLU(),
torch.nn.ConvTranspose3d(384, 256, kernel_size=5, stride=1),
torch.nn.ReLU(),
torch.nn.ConvTranspose3d(256, 96, kernel_size=7, stride=1),
torch.nn.ReLU(),
torch.nn.Conv3d(96, 1, kernel_size=1, stride=1),
)
Point Cloud & Mesh: 2-layer MLP
self.decoder = torch.nn.Sequential(
torch.nn.Linear(512, 1024),
torch.nn.GELU(),
torch.nn.Linear(1024, self.n_point * 3),
)
Other configuration
Other hyparameters reuses the default configuration.
Original Image:
Target:
Reconstructed(Voxel):
Reconstructed(Point Cloud):
Reconstructed(Mesh):
Original Image:
Target:
Reconstructed(Voxel):
Reconstructed(Point Cloud):
Reconstructed(Mesh):
Original Image:
Target:
Reconstructed(Voxel):
Reconstructed(Point Cloud):
Reconstructed(Mesh):
Below shows the F1 score for three 3D reconstruction methods. Point Cloud scores the highest while mesh scores the lowest. The Voxel performs surpringly good which is probabily because I used more complex 3D CNN structure to model this problem, while for point cloud and meshs, I only use two layer MLPs. point lcoud performs better probabily becuase F1-Score
Voxel | Point Cloud | Mesh | |
---|---|---|---|
F1 | 88.619 | 94.865 | 85.324 |
In this question, we change the n_points for the point cloud reconstruction. For faster training, we reduce the training iteration to 101, and increase the batch size to 128. So the perfromance slightly reduces compared with Q2.2. Other settings remains unchanged.
We can see the F1 is increasing as we increase the number of points. It shows predicting more points leads to better reconstruction. A potential reason could be that the dimension of final layer increases, so the model is actually wider in the last layer. Also it could be that more points lead to higher likelihood for the GT to find nearest prediction.
n_points(k) | 10 | 8 | 6 | 4 | 2 |
---|---|---|---|---|---|
F1 | 91.215 | 91.000 | 90.233 | 89.887 | 88.619 |
The below gif shows the evolution of the reconstructed plot as we increase the n_points
. It clearly shows that when n_points
is increasing, the point clouds adds more details to the 3D object(like the chair legs). We can also observe there are points clustering in some space, which may be an indicator that there are redundant points if n_points
is too large.
idx == 0:
idx == 200:
idx == 300: