16-889 Assignment 2: Single View to 3D

Name: Manikandtan Chiral Kunjunni Kartha
AndrewID: mchiralk
supporting files: google drive

EXTRA DAYS: 4

Goals: In this assignment, you will explore the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.

1.1. Fitting a voxel grid (5 points)

In this subsection, we will define binary cross entropy loss that can help us fit a 3D binary voxel grid. Define the loss functions here in losses.py file. For this you can use the pre-defined losses in pytorch library.

implementation

def voxel_loss(voxel_src, voxel_tgt):=
    sigmoid = torch.nn.Sigmoid()
    loss = torch.nn.BCELoss()(sigmoid(voxel_src), voxel_tgt)
    return loss

run

python main.py --q 1.1

Predicted and Ground truth voxel grid

fit vox

1.2. Fitting a point cloud (10 points)

In this subsection, we will define chamfer loss that can help us fit a 3D point cloud . Define the loss functions here in losses.py file. We expect you to write your own code for this and not use any pytorch3d utilities. You are allowed to use functions inside pytorch3d.ops.knn such as knn_gather or knn_points

implementation

def chamfer_loss(point_cloud_src, point_cloud_tgt):
    dist1, _, _ = knn.knn_points(point_cloud_src, point_cloud_tgt)
    dist2, _, _ = knn.knn_points(point_cloud_tgt, point_cloud_src)
    combined = torch.cat([dist1, dist2])
    loss_chamfer = torch.mean(combined, dim=0)
    loss_chamfer = torch.mean(loss_chamfer)
    return loss_chamfer

run

python main.py --q 1.2

Predicted and Ground truth point cloud

fit point

1.3. Fitting a mesh (5 points)

In this subsection, we will define an additional smoothening loss that can help us fit a mesh. Define the loss functions here in losses.py file.

For this you can use the pre-defined losses in pytorch library.

implementation

def smoothness_loss(mesh_src):
    loss_laplacian = mesh_laplacian_smoothing(mesh_src)
    return loss_laplacian

run

python main.py --q 1.3

Predicted and Ground truth mesh

fit mesh

2. Reconstructing 3D from single view

2.1. Image to voxel grid (15 points)

In this subsection, we will define a neural network to decode binary voxel grids.

implementation

class voxelNet(nn.Module):
    def __init__(self, input_dim, output_dim, device):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, output_dim, device=device)

    def forward(self, x):
        b = x.shape[0]
        out = self.fc1(x)
        out = out.reshape((b, 1, 32, 32, 32))
        return out

run

python main.py --q 2.1

image	3D model - predicted, GT

2.2. Image to point cloud (15 points)

In this subsection, we will define a neural network to decode point clouds.

implementation

class pointNet(nn.Module):
    def __init__(self, input_dim, num_points, device):
        super().__init__()
        self.num_points = num_points
        self.fc1 = nn.Linear(input_dim, num_points, device=device)
        self.fc2 = nn.Linear(num_points, num_points * 3, device=device)
        self.relu = nn.LeakyReLU(0.1)

    def forward(self, x):
        input_shape = x.shape[0]
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        out = out.reshape((input_shape, self.num_points, 3))
        return out

run

python main.py --q 2.2

image	3D model - predicted, GT

2.3. Image to mesh (15 points)

In this subsection, we will define a neural network to decode mesh.

implementation

class MeshNet(nn.Module):
    def __init__(self, input_dim, num_verts, device):
        super().__init__()
        self.num_verts = num_verts
        self.fc1 = nn.Linear(input_dim, 4096, device=device)
        self.fc2 = nn.Linear(4096, 4096, device=device)
        self.fc3 = nn.Linear(4096, num_verts, device=device)
        self.fc4 = nn.Linear(num_verts, num_verts * 3, device=device)
        self.relu = nn.LeakyReLU(0.1)

    def forward(self, x):
        input_shape = x.shape[0]
        out = self.relu(self.fc1(x))
        out = self.relu(self.fc2(out))
        out = self.relu(self.fc3(out))
        out = self.relu(self.fc4(out))
        out = out.reshape((input_shape, self.num_verts * 3))
        return out

run

python main.py --q 2.3

image	3D model - predicted, GT

2.4. Quantitative comparisions(10 points)

Quantitatively compare the F1 score of 3D reconstruction for meshes vs pointcloud vs voxelgrids.

Type	F1 Score @0.05
vox	72.615
point	92.762
mesh	86.090

run

python main.py --q 2.4

voxel grid

Has the lowest F1 Score. Voxel grid has a resolution of 32x32x32, which is quite coarse for thin/narrow sections in the model, and can cause a higher error if there is a deviation in even few voxels.

point cloud

Has the highest F1 score. There are no intermediate conversions unlike other representations like meshes and voxels and this helps the points have a better score.

mesh

Has a lower score than point clouds. This could be because of constraints on vertices neighborhood and connectivity of faces. Also thin and narrow mesh shapes, especially for chair legs are inherently more difficult to generate and could cause a lower score.

2.5. Analyse effects of hyperparms variations (10 points)

Analyse the results, by varying a hyperparameter of your choice.

I experimented with different initializations for the mesh representation

I tried using torus and a Chair model from Free3D and visualized the results.

torus mesh initialization

mesh_pred = torus(0.5, 1.5, 10, 10, self.device)

24 torus initial

image	3D model - predicted, GT

Chair mesh initialization

mesh_pred = load_objs_as_meshes(['Chair_wooden.obj'], self.device)

24 chair initial

image	3D model - predicted, GT

2.6. Interpret your model (15 points)

I ran a TSNE visualization on the features from the final fully connected layer of the decoder from the PointNet model. I wanted to see if the tsne distribution of the feature embedding would follow any recognizable pattern. Below is the TSNE visualization.

TSNE

What I observed is that while there is a high-level clustering based on the object color, a lot of the structural features of the chairs were similar for representations close to one another.

Clusters in the top right have a squared-off back design as a common theme. Similarly, chairs with flat backs, arches on the top and different spokes for the base can be seen at the bottom. some clusters also seem to be formed w.r.t their orientation in the image.