16-889 Assignment 2: Single View to 3D

Name: Meghana Reddy Ganesina

Andrew ID: mganesin

2-Grace Days

1. Exploring loss functions

1.1. Fitting a voxel grid (5 points)

Command to run: python main.py --question_number q1_1

Optimized Voxel Grid Ground Truth Voxel Grid
Optimized Voxel Grid Ground Truth Voxel Grid

1.2. Fitting a point cloud (10 points)

Command to run: python main.py --question_number q1_2

Optimized Point Cloud Ground Truth Point Cloud
Optimized Point Cloud Ground Truth Point Cloud

1.3. Fitting a mesh (5 points)

Command to run: python main.py --question_number q1_3

Optimized Mesh Ground Truth Mesh
Optimized Mesh Ground Truth Mesh

2. Reconstructing 3D from single view

2.1. Image to voxel grid (15 points)

Commands to run: python main.py --question_number q2_1

Architecture:

class VoxDecoder(nn.Module): def __init__(self): super(VoxDecoder, self).__init__() self.fc1 = nn.Linear(512, 1024) self.relu = nn.PReLU() self.fc2 = nn.Linear(1024, 32*32*32) self.model = nn.Sequential(self.fc1,self.relu,self.fc2) def forward(self, x): x = self.model(x) x = x.reshape([-1,1,32,32,32]) return x
Ground Truth image Ground Truth Mesh Predicted Voxel grid
Mesh Image Ground Truth Mesh Predicted Vox
Mesh Image Ground Truth Mesh Predicted Vox
Mesh Image Ground Truth Mesh Predicted Vox

2.2. Image to point cloud (15 points)

Commands to run: python main.py --question_number q2_2

Architecture:

class PointDecoder(nn.Module): def __init__(self, n_points): super(PointDecoder, self).__init__() self.n_points = n_points self.fc1 = nn.Linear(512, 4096) self.fc2 = nn.Linear(4096, 4096) self.fc6 = nn.Linear(4096, self.n_points*3) self.relu = nn.ReLU() self.layer1 = [self.fc1, self.relu] self.layer2 = [self.fc2, self.relu] self.layer6 = [self.fc6] self.model = nn.Sequential(*list(self.layer1+self.layer2+ self.layer6)) def forward(self, x): N = x.shape[0] x = self.model(x) x = x.reshape([N, -1, 3]) return x
Ground Truth image Ground Truth Mesh Predicted Point Cloud
Mesh Image Ground Truth Mesh Predicted Point Cloud
Mesh Image Ground Truth Mesh Predicted Point Cloud
Mesh Image Ground Truth Mesh Predicted Point Cloud

2.3. Image to mesh (15 points)

Commands to run: python main.py --question_number q2_3

Architecture:

class MeshDecoder(nn.Module): def __init__(self): super(MeshDecoder, self).__init__() self.fc1 = nn.Linear(512, 2048) self.fc2 = nn.Linear(2048, 4096) self.fc3 = nn.Linear(4096, 4096) self.fc4 = nn.Linear(4096, 4096) self.fc5 = nn.Linear(4096, 7686) self.relu = nn.ReLU() self.tanh = nn.Tanh() self.layer1 = [self.fc1, self.relu] self.layer2 = [self.fc2, self.relu] self.layer3 = [self.fc3, self.relu] self.layer4 = [self.fc4, self.relu] self.layer5 = [self.fc5, self.tanh] self.model = nn.Sequential(*list(self.layer1+self.layer2+self.layer3 +self.layer4+self.layer5)) def forward(self, x): x = self.model(x) return x
Ground Truth image Ground Truth Mesh Predicted Mesh
Mesh Image Ground Truth Mesh Predicted Mesh
Mesh Image Ground Truth Mesh Predicted Mesh
Mesh Image Ground Truth Mesh Predicted Mesh

2.4. Quantitative comparisions(10 points)

Representation Voxel Point cloud Mesh
F1 Score @ 0.05 81.236 94.951 87.272
  1. From the above visualizations, we can see that the voxel representation is the toughest to learn. For example, the chairs with thin/sleek structures (legs) are very tough to render. There can be issues of disconnected components in this case which is quite evident in the below gif. The resolution of the voxel grid is 32x32x32 which is not adequate to capture the fine details.
Ground Truth Mesh Predicted Voxel Grid
Vox GT Vox pred
  1. Mesh representation fails to learn chairs with holes/voids in their structures as they are learnt by deforming the ico_sphere. Points are sampled from the faces of the mesh and the chamfer distance is then calculated with respect to the groundtruth. It is a harder task to correct the structure by moving all the vertices to reduce the loss function.

  2. Point clouds are the easiest one to learn and render. The points provide lot of flexibility to deform and capture complex and intricate structures of the chairs unlike other representations.

2.5. Analyse effects of hyperparms variations (10 points)

Commands to run: python main.py --question_number q1_2

I have analyzed the effects of various hyperparameters like w_smooth and n_points on mesh.

a) w_smooth: As the value of w_smooth value was increased, the edges of the meshes were smoothened (see below figure). The loss function has the regularization term which caters the smoothness factor controlled by the w_smooth value.

w_smooth = 0.1 w_smooth = 1.5 w_smooth = 4
Vox GT Vox GT Vox GT

b) n_points: As the number of points samples on the mesh faces increased, the rendered meshes captured more details compared to the meshes with less number of points. As we can see below, the chair rendered with more n_points captured intricate structures.

n_points = 5000 n_points = 10000
Vox GT Vox GT

2.6. Interpret your model (15 points)

Commands to run: python main.py --question_number q2_6

Taking inspiration from my previous work, I have used principal component analysis (PCA) on the features of the penultimate layer of the network i.e the fully connected layer before the output layer. This analysis gives us an idea of what the model has learnt about representing the chairs.

a) I fit PCA on penultimate layer feature of training data and transform the penultimate layer feature of test data.

b) Sorted the test dataset on the basis of the feature activations obtained by applying the transform on test set for each feature in the reduced set of features.

c) Here I have chosen n_components as 10. On further analysis, I was able to extract features like number of seats, width of the base of the chair and height of the chair.

Below are some visualizations for few features (Top 5 and bottom 5 images for the corresponding feature):

Width of the base of the chair:

a. The following images are top 5 images in the test set when sorted along this feature. As we can see the all the chairs with circular base are grouped together.

Vox GT

b. The following images are bottom 5 images in the test set when sorted along this feature. As we can see the all the chairs with wider base are grouped together.

Vox GT

Number of seats of the chair:

a. The following images are top 5 images in the test set when sorted along this feature. As we can see the all the chairs with more than one seat are grouped together.

Vox GT

b. The following images are bottom 5 images in the test set when sorted along this feature. As we can see the all the chairs with single seat are grouped together.

Vox GT

Height of the chair:

a. The following images are top 5 images in the test set when sorted along this feature. As we can see the all the chairs with longer legs are grouped together.

Vox GT

b. The following images are bottom 5 images in the test set when sorted along this feature. As we can see the all the chairs with shorter legs are grouped together.

Vox GT