Name: Meghana Reddy Ganesina
Andrew ID: mganesin
Command to run: python main.py --question_number q1_1
Optimized Voxel Grid | Ground Truth Voxel Grid |
---|---|
![]() |
![]() |
Command to run: python main.py --question_number q1_2
Optimized Point Cloud | Ground Truth Point Cloud |
---|---|
![]() |
![]() |
Command to run: python main.py --question_number q1_3
Optimized Mesh | Ground Truth Mesh |
---|---|
![]() |
![]() |
Commands to run: python main.py --question_number q2_1
Architecture:
class VoxDecoder(nn.Module):
def __init__(self):
super(VoxDecoder, self).__init__()
self.fc1 = nn.Linear(512, 1024)
self.relu = nn.PReLU()
self.fc2 = nn.Linear(1024, 32*32*32)
self.model = nn.Sequential(self.fc1,self.relu,self.fc2)
def forward(self, x):
x = self.model(x)
x = x.reshape([-1,1,32,32,32])
return x
Ground Truth image | Ground Truth Mesh | Predicted Voxel grid |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Commands to run: python main.py --question_number q2_2
Architecture:
class PointDecoder(nn.Module):
def __init__(self, n_points):
super(PointDecoder, self).__init__()
self.n_points = n_points
self.fc1 = nn.Linear(512, 4096)
self.fc2 = nn.Linear(4096, 4096)
self.fc6 = nn.Linear(4096, self.n_points*3)
self.relu = nn.ReLU()
self.layer1 = [self.fc1, self.relu]
self.layer2 = [self.fc2, self.relu]
self.layer6 = [self.fc6]
self.model = nn.Sequential(*list(self.layer1+self.layer2+ self.layer6))
def forward(self, x):
N = x.shape[0]
x = self.model(x)
x = x.reshape([N, -1, 3])
return x
Ground Truth image | Ground Truth Mesh | Predicted Point Cloud |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Commands to run: python main.py --question_number q2_3
Architecture:
class MeshDecoder(nn.Module):
def __init__(self):
super(MeshDecoder, self).__init__()
self.fc1 = nn.Linear(512, 2048)
self.fc2 = nn.Linear(2048, 4096)
self.fc3 = nn.Linear(4096, 4096)
self.fc4 = nn.Linear(4096, 4096)
self.fc5 = nn.Linear(4096, 7686)
self.relu = nn.ReLU()
self.tanh = nn.Tanh()
self.layer1 = [self.fc1, self.relu]
self.layer2 = [self.fc2, self.relu]
self.layer3 = [self.fc3, self.relu]
self.layer4 = [self.fc4, self.relu]
self.layer5 = [self.fc5, self.tanh]
self.model = nn.Sequential(*list(self.layer1+self.layer2+self.layer3
+self.layer4+self.layer5))
def forward(self, x):
x = self.model(x)
return x
Ground Truth image | Ground Truth Mesh | Predicted Mesh |
---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Representation | Voxel | Point cloud | Mesh |
---|---|---|---|
F1 Score @ 0.05 | 81.236 | 94.951 | 87.272 |
Ground Truth Mesh | Predicted Voxel Grid |
---|---|
![]() |
![]() |
Mesh representation fails to learn chairs with holes/voids in their structures as they are learnt by deforming the ico_sphere. Points are sampled from the faces of the mesh and the chamfer distance is then calculated with respect to the groundtruth. It is a harder task to correct the structure by moving all the vertices to reduce the loss function.
Point clouds are the easiest one to learn and render. The points provide lot of flexibility to deform and capture complex and intricate structures of the chairs unlike other representations.
Commands to run: python main.py --question_number q1_2
I have analyzed the effects of various hyperparameters like w_smooth and n_points on mesh.
a) w_smooth: As the value of w_smooth value was increased, the edges of the meshes were smoothened (see below figure). The loss function has the regularization term which caters the smoothness factor controlled by the w_smooth value.
w_smooth = 0.1 | w_smooth = 1.5 | w_smooth = 4 |
---|---|---|
![]() |
![]() |
![]() |
b) n_points: As the number of points samples on the mesh faces increased, the rendered meshes captured more details compared to the meshes with less number of points. As we can see below, the chair rendered with more n_points captured intricate structures.
n_points = 5000 | n_points = 10000 |
---|---|
![]() |
![]() |
Commands to run: python main.py --question_number q2_6
Taking inspiration from my previous work, I have used principal component analysis (PCA) on the features of the penultimate layer of the network i.e the fully connected layer before the output layer. This analysis gives us an idea of what the model has learnt about representing the chairs.
a) I fit PCA on penultimate layer feature of training data and transform the penultimate layer feature of test data.
b) Sorted the test dataset on the basis of the feature activations obtained by applying the transform on test set for each feature in the reduced set of features.
c) Here I have chosen n_components as 10. On further analysis, I was able to extract features like number of seats, width of the base of the chair and height of the chair.
Below are some visualizations for few features (Top 5 and bottom 5 images for the corresponding feature):
a. The following images are top 5 images in the test set when sorted along this feature. As we can see the all the chairs with circular base are grouped together.
b. The following images are bottom 5 images in the test set when sorted along this feature. As we can see the all the chairs with wider base are grouped together.
a. The following images are top 5 images in the test set when sorted along this feature. As we can see the all the chairs with more than one seat are grouped together.
b. The following images are bottom 5 images in the test set when sorted along this feature. As we can see the all the chairs with single seat are grouped together.
a. The following images are top 5 images in the test set when sorted along this feature. As we can see the all the chairs with longer legs are grouped together.
b. The following images are bottom 5 images in the test set when sorted along this feature. As we can see the all the chairs with shorter legs are grouped together.