16-889 Assignment 2: Single View to 3D

Name: Sri Nitchith Akula
Andrew ID: srinitca

2-Grace Days

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Implementation

def voxel_loss(voxel_src,voxel_tgt): loss = nn.BCEWithLogitsLoss() prob_loss = loss(voxel_src,voxel_tgt) return prob_loss

Run Command

python main.py -q q1.1
Optimized Voxel Grid Ground Truth Voxel Grid
Optimized Voxel Grid Ground Truth Voxel Grid

1.2. Fitting a point cloud (10 points)

Implementation

def chamfer_loss(point_cloud_src,point_cloud_tgt): knn_1 = pytorch3d.ops.knn.knn_points(point_cloud_src, point_cloud_tgt)[0] knn_2 = pytorch3d.ops.knn.knn_points(point_cloud_tgt, point_cloud_src)[0] loss_chamfer = torch.sum(knn_1) + torch.sum(knn_2) return loss_chamfer

Run Command

python main.py -q q1.2
Optimized Point Cloud Ground Truth Point Cloud
Optimized Point Cloud Ground Truth Point Cloud

1.3. Fitting a mesh (5 points)

Implementation

def smoothness_loss(mesh_src): loss_laplacian = pytorch3d.loss.mesh_laplacian_smoothing(mesh_src) return loss_laplacian

Run Command

python main.py -q q1.3
Optimized Mesh Ground Truth Mesh
Optimized Mesh Ground Truth Mesh

2. Reconstructing 3D from single view

2.1. Image to voxel grid (15 points)

Architecture

class VoxNet(nn.Module): def __init__(self, feature_size): super(VoxNet3, self).__init__() self.fc1 = nn.Linear(feature_size, 1024) self.relu = nn.PReLU() self.fc2 = nn.Linear(1024, 32*32*32) return def forward(self, x): b = x.shape[0] x = self.fc1(x) x = self.relu(x) x = self.fc2(x) x = x.reshape((b, 1, 32, 32, 32)) return x

Run Command

python main.py -q q2.1
# Input RGB Image Predicted 3D Voxel Grid Ground Truth Mesh
1 Input RGB Image Predicted 3D vox cloud Ground Truth Mesh
2 Input RGB Image Predicted 3D vox cloud Ground Truth Mesh
3 Input RGB Image Predicted 3D vox cloud Ground Truth Mesh

2.2. Image to point cloud (15 points)

Architecture

class PointNet(nn.Module): def __init__(self, features, num_verts): super(PointNet, self).__init__() self.num_verts = num_verts self.fc1 = nn.Linear(features, 4096) self.fc2 = nn.Linear(4096, 4096) self.fc3 = nn.Linear(4096, num_verts * 3) self.relu = nn.PReLU() return def forward(self, x): n = x.shape[0] x = self.fc1(x) x = self.relu(x) x = self.fc2(x) x = self.relu(x) x = self.fc3(x) x = x.reshape((n, self.num_verts, 3)) return x

Run Command

python main.py -q q2.2
# Input RGB Image Predicted 3D point cloud Ground Truth Mesh
1 Input RGB Image Predicted 3D point cloud Ground Truth Mesh
2 Input RGB Image Predicted 3D point cloud Ground Truth Mesh
3 Input RGB Image Predicted 3D point cloud Ground Truth Mesh

2.3. Image to mesh (15 points)

Architecture

class MeshNet(nn.Module): def __init__(self, features, num_verts): super(MeshNet, self).__init__() self.num_verts = num_verts self.fc1 = nn.Linear(features, 1024) self.fc2 = nn.Linear(1024, 2048) self.fc3 = nn.Linear(2048, 4096) self.fc4 = nn.Linear(4096, 4096) self.fc5 = nn.Linear(4096, num_verts * 3) self.relu = nn.LeakyReLU() self.tanh = nn.Tanh() return def forward(self, x): n = x.shape[0] x = self.fc1(x) x = self.relu(x) x = self.fc2(x) x = self.relu(x) x = self.fc3(x) x = self.relu(x) x = self.fc4(x) x = self.relu(x) x = self.fc5(x) x = self.tanh(x) x = x.reshape((n, self.num_verts, 3)) return x

Run Command

python main.py -q q2.3
# Input RGB Image Predicted 3D Voxel Grid Ground Truth Mesh
1 Input RGB Image Predicted 3D mesh cloud Ground Truth Mesh
2 Input RGB Image Predicted 3D mesh cloud Ground Truth Mesh
3 Input RGB Image Predicted 3D mesh cloud Ground Truth Mesh

2.4. Quantitative comparisions(10 points)

Run Command

python main.py -q q2.4
Type F1 Score
Voxel grid 80.226
Point Cloud 93.021
Mesh 85.142

Point Cloud : We notice that the point cloud has the highest F1 score. This is understandable because we are trying to directly predict points independent of each other and its easier to predict atleast a few points at any part of chair

Mesh : Unlike point clouds, the predicted model is restricted by the neighboring vertices and face connectivity. If the original chair model has a thin structure, it is harder task for mesh to deform the original sphere to represent it well. If a chair has holes in it, it cant be represented well by deforming a sphere. Thus it gives less F1 score.

Voxel This has the least F1 score of all. To compute F1 score, we have to first generate mesh and then sample points. If the voxel grid itself has errors, the mesh representation will be effected which decresaes the F1 score even more. In the q2.1, the second image has thin chair and network output is missing its legs. Also, we are using 32x32x32 resolution for representing any kind of chair. For chairs that have thin parts, its better to increase the resolution of the prediction.

2.5. Analyse effects of hyperparms variations (10 points)

# w_smooth = 0.1 w_smooth = 1.5 w_smooth = 4
1 Predicted 3D point cloud Predicted 3D point cloud Predicted 3D point cloud
2 Predicted 3D point cloud Predicted 3D point cloud Predicted 3D point cloud
3 Predicted 3D point cloud Predicted 3D point cloud Predicted 3D point cloud

Observations:

2.6. Interpret your model (15 points)

Run Command

python main.py -q q2.6

I interpreted the Point Cloud Decoder model using feature embeddings. I wanted to test the following hypothesis

To test this hypothesis, I followed the steps below

Type Image idx in Test set
Arm Chair 0
Dining Chair 1
Club Chair (Large seat space) 120
Chair with base support 140
Chair with slant legs 165

K-closest Images

Observations: