Best Viewed on a half screen or mobile

Assignment 2 (16889)

Akshunn Jindal (akshunnj)

Note: Some questions have been rendered with a wiggly render which I invented to better show the mesh.

Question 1.1

Left gt, Right pred

python fit_data.py --type 'vox'


Question 1.2

Left gt, Right pred

python fit_data.py --type 'point'


Question 1.3

Left gt, Right pred

python fit_data.py --type 'mesh'


Question 2.1

Left 2d image, Mid gt, Right pred

python train_data.py --type 'vox'

python eval_data.py --type 'vox' --load_checkpoint


Question 2.2

Left 2d image, Mid gt, Right pred

python train_data.py --type 'point'

python eval_data.py --type 'point' --load_checkpoint


Question 2.3

Left 2d image, Mid gt, Right pred

python train_data.py --type 'mesh'

python eval_data.py --type 'mesh' --load_checkpoint


Question 2.4

Avg F1@0.05 Vox Avg F1@0.05 Point Avg F1@0.05 Mesh
75.713 94.081 91.050

Point Cloud performs the best because there are fewest constraints on the structure we are predicting as compared to voxel or mesh. Voxel has the constraint of fixed resolution. Mesh has the constraint that connections are fixed, which fixes the topology.

There are no constraints like that with point cloud mode. One constraint it has is that its hard to get a mesh out of this representation which is a lot more useful.

Mesh seems to perform the second best but the output are second worst when it comes to looks. The deformations cant change the topology. Imaging if a sphere has to be changed into donut such that chamfer loss is minimum, the best we can achieve is a blood cell shaped object slightly more depressed in the middle. So it looks bad.

But the catch here is that F1 comes out good because somehow, the number of points sampled in the holes is less. If I have to make a naive 3d model which samples more points in one region than other inspite of area being larger, I would just warp the points in way that multiple triangle overlap(one on top of the other) in a zig-zag fashion for areas where we want more points and vice versa.

Voxel can perform really good but it can't represent thin structures, due to limited resolution. Thin legs, armrests of a chair suffer due to this


Question 2.5

Avg F1@0.05 nPoints=500 Avg F1@0.05 nPoints=5000 Avg F1@0.05 nPoints=10000
84.059 94.081 94.270

I varied the number of points in poincloud generation algorithm. I ran for 500,5000,10000 points. As you can see we have hit a saturation in performance with 10000 points. After a point the only way to improve performance is changing inductive biases through better layers to transform the latent layer.


Question 2.6

I compute the latent 512 dimension vector of all members of test set and cluster them into 10 clusters. Then I pass the cluster center to decoder. As you can see each cluster centers represent a different kind of chair.

This tells us that our latent space is nicely organized into different types of chair. This helps subsequent layers of decoder classify what type of chair it is, if it needs to to get biased towards sofa or a chair.


Then I take the two farthest clusters and transform one cluster center into another through linear interpolation. It takes a while to convert fully in my visualization. This shows all points along the line also generates valid chairs.


Question 3.2

Left 2d image, Mid gt, Right pred

Average F1@0.05 = 92.284 for 5000 iterations

python train_data.py --type 'sheets'

python eval_data.py --type 'sheets' --load_checkpoint