Name: Manikandtan Chiral Kunjunni Kartha
AndrewID: mchiralk
supporting files: google drive
Goals: In this assignment, you will explore the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.
In this subsection, we will define binary cross entropy loss that can help us fit a 3D binary voxel grid. Define the loss functions here in losses.py
file. For this you can use the pre-defined losses in pytorch library.
def voxel_loss(voxel_src, voxel_tgt):=
sigmoid = torch.nn.Sigmoid()
loss = torch.nn.BCELoss()(sigmoid(voxel_src), voxel_tgt)
return loss
python main.py --q 1.1
In this subsection, we will define chamfer loss that can help us fit a 3D point cloud . Define the loss functions here in losses.py
file. We expect you to write your own code for this and not use any pytorch3d utilities. You are allowed to use functions inside pytorch3d.ops.knn such as knn_gather or knn_points
def chamfer_loss(point_cloud_src, point_cloud_tgt):
dist1, _, _ = knn.knn_points(point_cloud_src, point_cloud_tgt)
dist2, _, _ = knn.knn_points(point_cloud_tgt, point_cloud_src)
combined = torch.cat([dist1, dist2])
loss_chamfer = torch.mean(combined, dim=0)
loss_chamfer = torch.mean(loss_chamfer)
return loss_chamfer
python main.py --q 1.2
In this subsection, we will define an additional smoothening loss that can help us fit a mesh. Define the loss functions here in losses.py
file.
For this you can use the pre-defined losses in pytorch library.
def smoothness_loss(mesh_src):
loss_laplacian = mesh_laplacian_smoothing(mesh_src)
return loss_laplacian
python main.py --q 1.3
In this subsection, we will define a neural network to decode binary voxel grids.
class voxelNet(nn.Module):
def __init__(self, input_dim, output_dim, device):
super().__init__()
self.fc1 = nn.Linear(input_dim, output_dim, device=device)
def forward(self, x):
b = x.shape[0]
out = self.fc1(x)
out = out.reshape((b, 1, 32, 32, 32))
return out
python main.py --q 2.1
image | 3D model - predicted, GT |
---|---|
In this subsection, we will define a neural network to decode point clouds.
class pointNet(nn.Module):
def __init__(self, input_dim, num_points, device):
super().__init__()
self.num_points = num_points
self.fc1 = nn.Linear(input_dim, num_points, device=device)
self.fc2 = nn.Linear(num_points, num_points * 3, device=device)
self.relu = nn.LeakyReLU(0.1)
def forward(self, x):
input_shape = x.shape[0]
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
out = out.reshape((input_shape, self.num_points, 3))
return out
python main.py --q 2.2
image | 3D model - predicted, GT |
---|---|
In this subsection, we will define a neural network to decode mesh.
class MeshNet(nn.Module):
def __init__(self, input_dim, num_verts, device):
super().__init__()
self.num_verts = num_verts
self.fc1 = nn.Linear(input_dim, 4096, device=device)
self.fc2 = nn.Linear(4096, 4096, device=device)
self.fc3 = nn.Linear(4096, num_verts, device=device)
self.fc4 = nn.Linear(num_verts, num_verts * 3, device=device)
self.relu = nn.LeakyReLU(0.1)
def forward(self, x):
input_shape = x.shape[0]
out = self.relu(self.fc1(x))
out = self.relu(self.fc2(out))
out = self.relu(self.fc3(out))
out = self.relu(self.fc4(out))
out = out.reshape((input_shape, self.num_verts * 3))
return out
python main.py --q 2.3
image | 3D model - predicted, GT |
---|---|
Quantitatively compare the F1 score of 3D reconstruction for meshes vs pointcloud vs voxelgrids.
Type | F1 Score @0.05 |
---|---|
vox | 72.615 |
point | 92.762 |
mesh | 86.090 |
python main.py --q 2.4
Has the lowest F1 Score. Voxel grid has a resolution of 32x32x32, which is quite coarse for thin/narrow sections in the model, and can cause a higher error if there is a deviation in even few voxels.
Has the highest F1 score. There are no intermediate conversions unlike other representations like meshes and voxels and this helps the points have a better score.
Has a lower score than point clouds. This could be because of constraints on vertices neighborhood and connectivity of faces. Also thin and narrow mesh shapes, especially for chair legs are inherently more difficult to generate and could cause a lower score.
Analyse the results, by varying a hyperparameter of your choice.
I experimented with different initializations for the mesh representation
I tried using torus and a Chair model from Free3D and visualized the results.
mesh_pred = torus(0.5, 1.5, 10, 10, self.device)
image | 3D model - predicted, GT |
---|---|
mesh_pred = load_objs_as_meshes(['Chair_wooden.obj'], self.device)
image | 3D model - predicted, GT |
---|---|
I ran a TSNE visualization on the features from the final fully connected layer of the decoder from the PointNet model. I wanted to see if the tsne distribution of the feature embedding would follow any recognizable pattern. Below is the TSNE visualization.
What I observed is that while there is a high-level clustering based on the object color, a lot of the structural features of the chairs were similar for representations close to one another.
Clusters in the top right have a squared-off back design as a common theme. Similarly, chairs with flat backs, arches on the top and different spokes for the base can be seen at the bottom. some clusters also seem to be formed w.r.t their orientation in the image.