Name: Sri Nitchith Akula
Andrew ID: srinitca
Implementation
def voxel_loss(voxel_src,voxel_tgt):
loss = nn.BCEWithLogitsLoss()
prob_loss = loss(voxel_src,voxel_tgt)
return prob_loss
Run Command
python main.py -q q1.1
Optimized Voxel Grid | Ground Truth Voxel Grid |
---|---|
![]() |
![]() |
Implementation
def chamfer_loss(point_cloud_src,point_cloud_tgt):
knn_1 = pytorch3d.ops.knn.knn_points(point_cloud_src, point_cloud_tgt)[0]
knn_2 = pytorch3d.ops.knn.knn_points(point_cloud_tgt, point_cloud_src)[0]
loss_chamfer = torch.sum(knn_1) + torch.sum(knn_2)
return loss_chamfer
Run Command
python main.py -q q1.2
Optimized Point Cloud | Ground Truth Point Cloud |
---|---|
![]() |
![]() |
Implementation
def smoothness_loss(mesh_src):
loss_laplacian = pytorch3d.loss.mesh_laplacian_smoothing(mesh_src)
return loss_laplacian
Run Command
python main.py -q q1.3
Optimized Mesh | Ground Truth Mesh |
---|---|
![]() |
![]() |
Architecture
class VoxNet(nn.Module):
def __init__(self, feature_size):
super(VoxNet3, self).__init__()
self.fc1 = nn.Linear(feature_size, 1024)
self.relu = nn.PReLU()
self.fc2 = nn.Linear(1024, 32*32*32)
return
def forward(self, x):
b = x.shape[0]
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = x.reshape((b, 1, 32, 32, 32))
return x
Run Command
python main.py -q q2.1
# | Input RGB Image | Predicted 3D Voxel Grid | Ground Truth Mesh |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
Architecture
class PointNet(nn.Module):
def __init__(self, features, num_verts):
super(PointNet, self).__init__()
self.num_verts = num_verts
self.fc1 = nn.Linear(features, 4096)
self.fc2 = nn.Linear(4096, 4096)
self.fc3 = nn.Linear(4096, num_verts * 3)
self.relu = nn.PReLU()
return
def forward(self, x):
n = x.shape[0]
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
x = x.reshape((n, self.num_verts, 3))
return x
Run Command
python main.py -q q2.2
# | Input RGB Image | Predicted 3D point cloud | Ground Truth Mesh |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
Architecture
class MeshNet(nn.Module):
def __init__(self, features, num_verts):
super(MeshNet, self).__init__()
self.num_verts = num_verts
self.fc1 = nn.Linear(features, 1024)
self.fc2 = nn.Linear(1024, 2048)
self.fc3 = nn.Linear(2048, 4096)
self.fc4 = nn.Linear(4096, 4096)
self.fc5 = nn.Linear(4096, num_verts * 3)
self.relu = nn.LeakyReLU()
self.tanh = nn.Tanh()
return
def forward(self, x):
n = x.shape[0]
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.relu(x)
x = self.fc3(x)
x = self.relu(x)
x = self.fc4(x)
x = self.relu(x)
x = self.fc5(x)
x = self.tanh(x)
x = x.reshape((n, self.num_verts, 3))
return x
Run Command
python main.py -q q2.3
# | Input RGB Image | Predicted 3D Voxel Grid | Ground Truth Mesh |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
Run Command
python main.py -q q2.4
Type | F1 Score |
---|---|
Voxel grid | 80.226 |
Point Cloud | 93.021 |
Mesh | 85.142 |
Point Cloud : We notice that the point cloud has the highest F1 score. This is understandable because we are trying to directly predict points independent of each other and its easier to predict atleast a few points at any part of chair
Mesh : Unlike point clouds, the predicted model is restricted by the neighboring vertices and face connectivity. If the original chair model has a thin structure, it is harder task for mesh to deform the original sphere to represent it well. If a chair has holes in it, it cant be represented well by deforming a sphere. Thus it gives less F1 score.
Voxel This has the least F1 score of all. To compute F1 score, we have to first generate mesh and then sample points. If the voxel grid itself has errors, the mesh representation will be effected which decresaes the F1 score even more. In the q2.1, the second image has thin chair and network output is missing its legs. Also, we are using 32x32x32 resolution for representing any kind of chair. For chairs that have thin parts, its better to increase the resolution of the prediction.
batch_size >= 8
obtains good results for point, mesh and vox networks.w_smooth
valuesw_chamfer
with n_points
in train_model.py
to be able to do analysis on varying n_points
as welln_points
and traineing them for 10000
iterations# | w_smooth = 0.1 |
w_smooth = 1.5 |
w_smooth = 4 |
---|---|---|---|
1 | ![]() |
![]() |
![]() |
2 | ![]() |
![]() |
![]() |
3 | ![]() |
![]() |
![]() |
Observations:
w_smooth
, smoothness of the final reconstruction improves and takes the form of the ground truth objectRun Command
python main.py -q q2.6
I interpreted the Point Cloud Decoder model using feature embeddings. I wanted to test the following hypothesis
To test this hypothesis, I followed the steps below
Type | Image idx in Test set |
---|---|
Arm Chair | 0 |
Dining Chair | 1 |
Club Chair (Large seat space) | 120 |
Chair with base support | 140 |
Chair with slant legs | 165 |
Pass compelete test dataset into the network and extract feature embeddings of penultimate fc layer
For the above 5 images, find closest 4 images to each image in the feature embedding space
Observations: